indexing and joins. indexes increase database performanceincrease database performance must be...

22
Indexing and Joins Indexing and Joins

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Indexing and JoinsIndexing and Joins

Page 2: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

IndexesIndexes

• Increase database performanceIncrease database performance

• must be explicitly definedmust be explicitly defined

• once defined, are transparent to the useronce defined, are transparent to the user

• once created, the system maintains itonce created, the system maintains it

• more than one can exist on a given tablemore than one can exist on a given table

Page 3: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Creating an IndexCreating an Index

• SyntaxSyntax

CREATE [UNIQUE] INDEX index_nameCREATE [UNIQUE] INDEX index_name

ON table_name (column_name)ON table_name (column_name)

• ExampleExamplecreate index auindcreate index auindon authors (au_id)on authors (au_id)

Page 4: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Composite IndexComposite Index

•Used when columns have a logical relationship and would be searched as a unit

•Examplecreate index au_name_indon authors (au_lname, au_fname)

•order not important, but performance is better when primary search col is first

Page 5: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

2 Kinds of Indexes2 Kinds of Indexes

• Unique IndexUnique Index

• Clustered IndexClustered Index

Page 6: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Unique IndexUnique Index• No 2 rows are permitted to have the same No 2 rows are permitted to have the same

valuevalue• system checks data upon creation and data system checks data upon creation and data

additionaddition• rejects duplicates and returns an errorrejects duplicates and returns an error• should only be created on a column that should only be created on a column that

requires uniqueness eg. ssn, acct coderequires uniqueness eg. ssn, acct code• can be created as a composite or single can be created as a composite or single

columncolumn• helps in maintaining data integrityhelps in maintaining data integrity• boosts search performanceboosts search performance

Page 7: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Clustered IndexClustered Index

•System sorts rows on an ongoing basis so that the physical order is the same as the indexed order

•only 1 can exist per table

•should only be created for a column that is most often retrieved in order

•greatly increases performance when searching for contiguous key values… especially a range

•slows down data updates due to the sorting involved

Page 8: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Things to ConsiderThings to Consider

• Indexes greatly increase query response Indexes greatly increase query response timetime

• every index requires system resources to every index requires system resources to store and maintainstore and maintain

• indexes can actually slow down the indexes can actually slow down the performance of UPDATES, INSERTS, and performance of UPDATES, INSERTS, and DELETES due to index maintenanceDELETES due to index maintenance

So… don’t over indexSo… don’t over index

Page 9: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

What Should We Index?What Should We Index?

• Any column frequently used in retrievalAny column frequently used in retrieval

• primary key columnsprimary key columns

• columns that are often queried in a sorted columns that are often queried in a sorted orderorder

• columns that are used in joinscolumns that are used in joins

• columns that are often searched for rangescolumns that are often searched for ranges

Page 10: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

We Should NOT Index…We Should NOT Index…

• Columns rarely used in queriesColumns rarely used in queries

• columns with 2 or 3 possible values columns with 2 or 3 possible values eg. Male or Female eg. Male or Female

• small tablessmall tables

Page 11: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

SQL-92 Create Table SQL-92 Create Table ConstraintsConstraints• PRIMARY KEYPRIMARY KEY

– rejects duplicates and nullsrejects duplicates and nulls

• UNIQUEUNIQUE– rejects duplicates, allows nullsrejects duplicates, allows nulls

• DEFAULTDEFAULT– inserts the default value when no value is enteredinserts the default value when no value is entered

• CHECKCHECK– validates data formatvalidates data format

• FOREIGN KEY and REFERENCESFOREIGN KEY and REFERENCES– ties foreign key to the primary key it referencesties foreign key to the primary key it references

Page 12: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Put it on paper!Put it on paper!

Column Datatype Null? Key Default Check Reference

title_id char(6) not null primary, unique 2 letter then 2 digtitle varchar(80) not null unique

type char(12) unclass

business, mod_cook, trad_cook

pub_id char(4) nullpublishers, pub_id

price money nulladvance money

Page 13: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Then write your SQLThen write your SQLcreate table titlecreate table title

(title_id char(6) not null(title_id char(6) not nullconstraint tididx primary keyconstraint tididx primary keyconstraint tidcheck checkconstraint tidcheck check(title_id like ‘[A-Z] [A-Z] [0-9] [0-9]…’)(title_id like ‘[A-Z] [A-Z] [0-9] [0-9]…’),,title varchar(80) not nulltitle varchar(80) not nullconstraint titleidx uniqueconstraint titleidx unique,,type char(12)type char(12)default ‘unclassified’ nulldefault ‘unclassified’ nullconstraint typechk checkconstraint typechk check(type in(‘business’, ‘mod_cook’, ‘trad_cook’))(type in(‘business’, ‘mod_cook’, ‘trad_cook’)),,pub_id char(4) nullpub_id char(4) nullreference publishers (pub_id)reference publishers (pub_id),,price money null,price money null,advance money null)advance money null)

Page 14: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Changing a TableChanging a Table

• SyntaxSyntax

– ALTER table table_nameALTER table table_nameadd column_name datatype null|not add column_name datatype null|not nullnull

Page 15: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Removing ObjectsRemoving Objects• DatabaseDatabase

– DROP DATABASE db_nameDROP DATABASE db_name

– deletes ALL tables and data within it!!deletes ALL tables and data within it!!

• TableTable– DROP TABLE table_nameDROP TABLE table_name

– deletes table and its contentsdeletes table and its contents

• IndexIndex– DROP INDEX table_name.index_nameDROP INDEX table_name.index_name

– deletes named index on named tabledeletes named index on named table

Page 16: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

JoinsJoins

In order to maintain normalization in In order to maintain normalization in the database design it is necessary the database design it is necessary to break up data into separate to break up data into separate tables. The data can then be re-tables. The data can then be re-associated through the use of a join.associated through the use of a join.

Page 17: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

JoinsJoins• What columns do I need?What columns do I need?

• What tables have these columns?What tables have these columns?

• Are all the tables related in some way?Are all the tables related in some way?

• If not, are there other tables that can relate them?If not, are there other tables that can relate them?

• How are they all related?How are they all related?

• Link them together by setting their common fields Link them together by setting their common fields equal in the WHERE clause.equal in the WHERE clause.

• Restrict the WHERE clause to the record(s) of interest.Restrict the WHERE clause to the record(s) of interest.

Page 18: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

What to join?What to join?

• Key columns are the best since these Key columns are the best since these were created for the purpose of existing were created for the purpose of existing as a reference.as a reference.

• Should have similar dataShould have similar data

• Should be the same datatypeShould be the same datatype

• nulls will not join since their value is not nulls will not join since their value is not known.known.

Page 19: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

SyntaxSyntax• Usually best to put the join conditions first Usually best to put the join conditions first

in the WHERE clausein the WHERE clause

• Use of aliases greatly simplifies the Use of aliases greatly simplifies the statement.statement.

• Any logical operator can be used.Any logical operator can be used.

• A self-join can be performed on the same A self-join can be performed on the same table by qualifying it twice.table by qualifying it twice.

Page 20: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

Self JoinSelf Join

Which authors in Oakland have the same zip code?Which authors in Oakland have the same zip code?

Select distinct au1.au_fname , au1.au_lname, au1.zipSelect distinct au1.au_fname , au1.au_lname, au1.zip

from from authors au1, authors au2authors au1, authors au2

where au1.city = “Oakland”where au1.city = “Oakland”

and au1.zip = au2.zipand au1.zip = au2.zip

and au1.au_id != au2.au_idand au1.au_id != au2.au_id

Page 21: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

How a Join is ProcessedHow a Join is Processed

• First the system obtains the Cartesian First the system obtains the Cartesian Product of all tables in joinProduct of all tables in joinCartesian Product - the matrix of all possible Cartesian Product - the matrix of all possible combinations that could satisfy the joincombinations that could satisfy the join

• The select list is used to restrict the columns The select list is used to restrict the columns returnedreturned

• The WHERE clause is then used to restrict the The WHERE clause is then used to restrict the rows return that satisfy the queryrows return that satisfy the query

Page 22: Indexing and Joins. Indexes Increase database performanceIncrease database performance must be explicitly definedmust be explicitly defined once defined,

2 ways of looking at a Join2 ways of looking at a Join

• Looking at all the tables, linking Looking at all the tables, linking them together and treating them them together and treating them like one big table.like one big table.

• Setting the main search criteria and Setting the main search criteria and then linking the common fields to then linking the common fields to the data that is of interest.the data that is of interest.