copyright, harris corporation & ophir frieder, 19981 de-normalization “no major application...

63
Copyright, Harris Corporation & Ophir Frieder, 1998 1 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference, Koch and Loney

Upload: edgar-willis

Post on 04-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19981

De-Normalization

“No major application will run in Third Normal Form.”

-Oracle8, The Complete Reference, Koch and Loney

Page 2: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19982

Objectives

• Learn about the relationship between normal forms and performance.

• Learn de-normalization techniques.

• Learn how to maintain data integrity when that data has been de-normalized.

Page 3: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19983

Warning!

• The topic of de-normalization should be approached with extreme caution!

• De-normalized databases are more difficult to keep in a consistent state and, consequently, should only be used when performance (or other) issues dictate that they should.

• De-normalization often times implies more code, and a greater possibility of errors.

• In addition to the de-normalized physical model, a normalized logical model should always be maintained.

Page 4: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19984

Motivation

• The primary purpose of normalization is to eliminate redundancy, which in turn:– Eliminates update anomalies (insert, delete, and update).

– Ensures data integrity.

– Saves space.

Question: But doesn’t it also help with “selects” and other types of retrievals?

Page 5: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19985

Motivation, Cont.

Answer: In many cases it does not and, in fact, depending on a variety of factors, can make some operations much slower and more complex.

• Relevant factors include, among other things, data type and size, type and frequency of queries issued, and even system characteristics such as the amount of main memory.

• This further emphasizes the role of benchmarking and application analysis in physical design.

Page 6: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19986

Motivation, Cont.

• Consider the following BCNF relational schemes.

EMPLOYEE DEPARTMENT

EMPLOYEE#=>DEPARTMENT# DEPARTMENT#=>MANAGER

EMPLOYEE# DEPARTMENT#

263349889 23

342523423 16

039324253 32

746532421 11

DEPARTMENT# MANAGER

10 Jimmy Johnson

11 Bill Walsh

12 Mike Ditka

13 Bill Parcells

Page 7: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19987

Motivation,Cont.

• Creating a report listing the name of the manager for each employee requires a join of the two tables:

select EMPLOYEE#, MANAGER

from EMPLOYEE e, DEPARTMENT d

where e.DEPARTMENT# = d.DEPARTMENT#

• Such a join may involve an examination of as many as nxm rows, where n and m are the number of rows in the two tables.

• Also note that data may be paged in and out of memory at a similar rate (i.e., nxm).

Page 8: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19988

Motivation, Cont.

• Consider the following (de-normalized) version of the EMPLOYEE relational scheme:

EMPLOYEE/DEPT

EMPLOYEE#=>DEPARTMENT#

DEPARTMENT#=>MANAGER

EMPLOYEE# DEPARTMENT# MANAGER

263349889 23 Howie Long

342523423 16 Bruce Smith

039324253 32 Marv Levy

342567324 16 Bruce Smith

746532421 11 Bill Cower

Page 9: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 19989

Motivation

• The resulting relational scheme is in 2NF, but not 3NF (why?).

• Using the new (de-normalized) table, creating a report listing the name of the manager for each employee can be done with the following query:

select EMPLOYEE#, MANAGER

from EMPLOYEE/DEPT

• In contrast to the join, this will involve an examination of at most n rows, and data will be paged in and out of memory at a similar rate (i.e., n).

• Note the introduction of redundancy, however.

Page 10: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199810

De-Normalization

• De-normalization is a process, making use of a variety of techniques, that modifies a relational scheme that has been normalized to some degree.

• A relational scheme can be de-normalized for any variety of reasons, but is typically done to improve performance.

• De-normalizing techniques should only be used when performance (or other) issues dictate that they should.

Page 11: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199811

De-Normalization

• De-normalization is more appropriate for DSS, data warehouse, and OLAP databases.

• De-normalization is less appropriate for OLTP databases.

• Once again, the role of benchmarking and application analysis is crucial.

Page 12: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199812

Techniques

• Collapsing tables

• Adding redundant columns

• Adding derived columns

• Duplicating tables

• Partitioning tables (horizontally and vertically)

• Using meaningful keys

• Junk drawers

• Computation tables

Page 13: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199813

Collapsing Tables

• Recall the previous example, where the EMPLOYEE and DEPARTMENT tables were, in effect, combined.

EMPLOYEE#=>DEPARTMENT#

DEPARTMENT#=>MANAGER

EMPLOYEE# DEPARTMENT# MANAGER

263349889 23 Howie Long

342523423 16 Bruce Smith

039324253 32 Marv Levy

746532421 11 Bill Cower

Page 14: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199814

Collapsing Tables, Cont.

• Collapsing the two tables simply means that the two tables are combined and replaced by the resulting table (i.e., the original two tables are deleted).

• Note that this is a strict definition of “collapsing.” In theory, the original two tables could still be maintained.

Page 15: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199815

Collapsing Tables, Cont.

• Potential advantages:– Fewer joins, thus speeding retrieval.

– Fewer tables.

– Fewer indices.

– Fewer foreign keys.

• Potential disadvantages:– Update anomalies.

– Benefit is application specific, and must be re-evaluated when changes occur.

Page 16: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199816

Adding Redundant Columns

• Suppose the original DEPARTMENT table contained other columns.

DEPARTMENT

EMPLOYEE

DEPARTMENT# LOCATION MANAGER

10 Building 3 Jimmy Johnson

13 Building 1 Bill Parcells

EMPLOYEE# DEPARTMENT#

263349889 10

746532421 13

Page 17: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199817

Adding Redundant Columns, Cont.

• Creating a report listing the name of the manager for each employee requires a join of the two tables, as it did before:

select EMPLOYEE#, MANAGER

from EMPLOYEE e, DEPARTMENT d

where e.DEPARTMENT# = d.DEPARTMENT#

• As before, such a join may involve an examination of as many as nxm rows, where n and m are the number of rows in the two tables.

• Data may be paged in and out of memory at a similar rate (i.e., nxm).

Page 18: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199818

Adding Redundant Columns, Cont.

• Option #1: Collapse the two tables (and delete the original two).

EMPLOYEE#=>DEPARTMENT#

DEPARTMENT#=>MANAGER

DEPARTMENT#=>LOCATION

EMPLOYEE# DEPARTMENT# LOCATION MANAGER

263349889 10 Building 3 Jimmy Johnson

746532421 13 Building 1 Bill Parcells

Page 19: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199819

Adding Redundant Columns, Cont.

• This has all the advantages and disadvantages described earlier.

• In addition, it introduces two transitive dependencies, both create redundancy, but only one helps eliminate the join.

Page 20: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199820

Adding Redundant Columns

• Option #2: Keep both tables, but add a redundant column to the EMPLOYEE table.

EMPLOYEE

DEPARTMENT

EMPLOYEE# DEPARTMENT# MANAGER

263349889 10 Jimmy Johnson

746532421 13 Bill Parcells

DEPARTMENT# LOCATION MANAGER

10 Building 3 Jimmy Johnson

13 Building 1 Bill Parcells

Page 21: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199821

Adding Redundant Columns

• The resulting EMPLOYEE relational scheme is in 2NF, but not 3NF because of the transitive dependencies

EMPLOYEE#=>DEPARTMENT#

DEPARTMENT#=>MANAGER

• This eliminates the need for the join; the desired report can now be created using the following query:

select EMPLOYEE#, MANAGER

from EMPLOYEE

• In contrast to the join, this will involve an examination of at most n rows, and data will be paged in and out of memory at a similar rate.

Page 22: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199822

Adding Redundant Columns, Cont.

• Potential advantages:– Fewer joins, thus speeding retrieval.

– Fewer indices to support joins.

• Potential disadvantages:– Slower data modifications.

– Requires more space.

– Data integrity is more difficult to guarantee, and application coding may be more complex.

– Benefit is application specific, and must be re-evaluated when changes occur.

Page 23: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199823

Adding Derived Columns

• Consider the following relational schemes.

SALES SALESPERSONS

SALESPERSON# SALE-REVENUE

263349889 10.95

263349889 15.95

039324253 6.45

746532421 3.50

SALESPERSON# NAME

263349889 Mary Brown

039324253 Bill Smith

746532421 Sue Jones

987657680 Bob Karp

Page 24: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199824

Adding Derived Columns,Cont.

• Creating a report listing the total amount of revenue brought in by each salesperson, along with their name, requires a join of the two tables:

select NAME, sum(SALE-REVENUE)

from SALES s, SALESPERSON sp

where s.SALESPERSON# = sp.SALESPERSON#

• Such a join may involve an examination of as many as nxm rows, where n and m are the number of rows in the two tables.

• Also note that data may be paged in and out of memory at a similar rate (i.e., nxm).

Page 25: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199825

Adding Derived Columns,Cont.

• Consider the following version of the SALESPERSON relational scheme:

• Note that the SALES table is still maintained.

SALESPERSON# TOTAL REVENUE NAME

263349889 324.97 Mary Brown

039324253 423.13 Bill Smith

746532421 95.34 Sue Jones

987657680 987.63 Bob Karp

Page 26: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199826

Adding Derived Columns,Cont.

• Using the new (de-normalized) table, creating a report listing the total amount of revenue brought in by each salesperson, along with their name, can be done with the following query:

select NAME, TOTAL-REVENUE

from SALESPERSON

• In contrast to the join, this will involve an examination of at most n rows, and data will be paged in and out of memory at a similar rate (i.e., n).

Page 27: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199827

Adding Derived Columns, Cont.

• Question: Does this technique actually de-normalize a relational scheme?

• Answer: In this particular example, no. However, it is frequently referred to as a “de-normalization.”

• Exercise: What are the dependencies, and what is a key for the relational scheme? Is it in 1NF, 2NF, 3NF, etc?

The dependency of TOTAL-REVENUE on SALE-REVENUE is not a functional dependency, but some other kind of dependency.

Page 28: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199828

Adding Derived Columns, Cont.

• Potential advantages:– The total revenue for a salesperson can be obtained without performing a

join, thus speeding retrieval.

– Fewer indices to support such joins.

– Fewer foreign keys.

• Potential disadvantages:– Slower and more complex data modifications; modifying a SALE-

REVENUE value in the SALES table will require a TOTAL-REVENUE value to be updated in the SALESPERSON table.

– The SALESPERSON table requires additional space.

Page 29: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199829

Horizontal Splitting

• Consider the following relational scheme called PERSON.

SS#=>LAST-NAME

SS#=>DATE-OF-BIRTH

SS# LAST-NAME DATE-OF-BIRTH

263349889 Brown 11/17/95

039324253 Jones 3/22/96

746532421 Karp 10/9/85

987657680 Smith 9/18/92

Page 30: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199830

Horizontal Splitting, Cont.

• Suppose that the table is sorted based on LAST-NAME.

• Suppose that the most frequently accessed rows are those corresponding to people most recently born.

• For example, the following query is likely to access few rows, but many of the table’s blocks.

select SS# from PERSON

where DATE-Of-BIRTH >= 1/1/99

• In such a case it may be helpful to split the table in two; one containing all rows with a DATE-OF-BIRTH on or after 1/1/99, and one containing all rows with a DATE-OF-BIRTH before 1/1/99.

Page 31: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199831

Horizontal Splitting, Cont.

PRE-DATE

POST-DATE

SS# LAST-NAME DATE-OF-BIRTH

746532421 Karp 10/9/85

987657680 Smith 9/18/92

SS# LAST-NAME DATE-OF-BIRTH

263349889 Brown 11/17/99

039324253 Jones 3/22/99

Page 32: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199832

Horizontal Splitting, Cont.

• Potential advantages:– Fewer rows and pages to read.

– Reduce the depth of an index (i.e., B-tree).

– Improve parallel access to data by reducing lock contention.

– Improve parallel access to data by spreading it across multiple devices and, perhaps, controllers.

– PRE-DATE can be stored on cheaper media (Large, inactive, or historical data can be isolated or archived).

• Potential disadvantages:– Operations and applications that are required to access data in both tables

become more complex (i.e., a query has to reference two different tables).

Page 33: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199833

Vertical Splitting

• Suppose that the PERSON relational scheme had very wide rows relative to page size.

• Also suppose that for each attribute A, SS#=>A is a functional dependency, and that these are the only functional dependencies.

SS# LAST-NAME ...... DATE-OF-BIRTH

263349889 Brown 11/17/95

039324253 Jones ...... 3/22/96

746532421 Karp 10/9/85

987657680 Smith 9/18/92

Page 34: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199834

Vertical Splitting, Cont.

• Relative to a table with small rows:– Longer rows => fewer rows per disk page.

– Fewer rows per disk page => more pages per table.

– More pages per table => more pages read on average.

• In such a case it may be helpful to split the table into two:– First table contains the key and some nonkey columns.

– Second table contains the key and remaining nonkey columns.

Page 35: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199835

Vertical Splitting, Cont.

PERSONAL-INFO

PROFESSIONAL-INFO

SS# LAST-NAME ... DATE-OF-BIRTH

746532421 Karp 10/9/85

987657680 Smith ... 9/18/92

SS# JOB-TITLE ... SALARY

746532421 Software Engineer 15,000

987657680 Chemist ... 45,000

Page 36: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199836

Vertical Splitting, Cont.

• Potential advantages:– Shorter rows, which means each table requires fewer pages to read.

– Speed access to frequently referenced columns.

– Improve parallel access to data by reducing lock contention.

– Improves parallel access to data by allowing the tables to be stored on separate devices and, perhaps, controllers.

• Potential disadvantages:– Joins will be required for some queries.

– Indices may be replicated, thereby using additional space.

– Decomposition may not preserve dependencies, thereby making modification of the tables more difficult.

Page 37: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199837

Duplicating Tables

• If a collection of users needs quick, frequent access to a subset of the rows of a table, then it might be beneficial to create a copy of the table, but only containing the subset of rows that are of interest.

PERSON

PERSON-SUB

Page 38: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199838

Duplicating Tables, Cont.

• Similar to, yet different from horizontal splitting:– Horizontal splitting creates two new tables, and eliminates the original;

duplicating a table creates one new table, but keeps the original.

– Horizontal splitting groups contiguous rows; duplicating a table can use an arbitrary subset of rows.

Page 39: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199839

Duplicating Tables, Cont

• Potential advantages:– Fewer rows and pages to read.

– Reduce the depth of an index (i.e., B-tree).

– Improve parallel access to data by reducing lock contention.

– Improve parallel access to data by spreading it across multiple devices and, perhaps, controllers.

• Potential disadvantages:– Some operations become more complex - applications that insert, delete,

or update data in one of the tables may have to operate on both.

Page 40: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199840

Using Meaningful Keys

• Consider the following relational schemes, which are used to store information on company employees, departments, projects, and customers.

EMPLOYEE DEPARTMENT PROJECT CUSTOMER

Employee# Department# Project# Customer#

Name Name Name Name

Department# Manager Department# Location

Project# Customer#

• Note that no particular set of functional dependencies are assumed, except that the ID#, or rather, the first field, in each case is a key.

Page 41: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199841

Using Meaningful Keys, Cont.

• In this case many meaningful queries require joins:

– select e.Name, d.Name

from EMPLOYEE e, DEPARTMENT d

where e.DEPARTMENT# = d.DEPARTMENT#;

– select e.Name, c.Name

from EMPLOYEE e, PROJECT p, CUSTOMER c

where (e.Project# = p.Project#) and (p.Customer# = c.Customer#);

Page 42: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199842

Using Meaningful Keys, Cont.

• If department names are unique, then the first join could be simplified by eliminating the Department# field.

• Similarly, if the customer names are unique, then the second join could be simplified by eliminating the Customer# field.

EMPLOYEE DEPARTMENT PROJECT CUSTOMER

Employee# Name Project# Name

Name Manager Name Location

Department-Name Department-Name

Project# Customer-Name

Page 43: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199843

Using Meaningful Keys, Cont.

• Equivalent results can now be obtained by the following queries:

– select Name, Department-Name

from EMPLOYEE

– select e.Name, p.Customer-Name

from EMPLOYEE e, PROJECT p

where e.Project# = p.Project#;

• Such a modification may not be possible for some tables; for example, for the EMPLOYEE table the Name field will probably not be unique.

Page 44: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199844

Using Meaningful Keys, Cont.

• Recall that no particular set of functional dependencies was assumed for the initial relational schemes.

• If the Name field of each department is unique, and if each department has one manager then the following are the functional dependencies for the DEPARTMENT relational scheme.

DEPARTMENT

Department# Department#=>Name

Name Department#=>Manager

Manager Name=>Department#

Name=>Manager

• Note that DEPARTMENT is in BCNF.

Page 45: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199845

Using Meaningful Keys, Cont.

• Now consider the functional dependencies for the DEPARTMENT relational scheme after the deletion of the Department# attribute.

DEPARTMENT

Name Name=>Manager

Manager

• Note that DEPARTMENT is still in BCNF.

Page 46: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199846

Using Meaningful Keys, Cont.

• Meaningful primary key names must be unique.

• Meaningful primary key names should be short - helps avoid typing errors and simplifies queries.

• Query/application analysis determines what keys should be made meaningful.

Page 47: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199847

Junk Drawer

• Many applications have a collection of unrelated pieces of information– Constants

– Codes

– A global status

• In theory, these belong in different locations, i.e., tables.

• This could result in a large number of very small tables, many, conceivably, with only a single row.

Page 48: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199848

Junk Drawer, Cont.

• Create a single table containing all of the records - like a junk drawer in the kitchen.

• The table could contain a column for every possible type of attribute, or different types of attributes could be stored in the same column, which would have a very vague name.

• Columns could be indexed as required, even for the retrieval of a single row.

Page 49: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199849

Junk DrawerExample

NAME CHRVALUE PROPERTIES RVALUE

Bill’s PC NULL 32MB RAM,17” Monitor, etc.

NULL

System Status Nominal NULL NULL

Pi NULL NULL 3.14

Page 50: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199850

Junk Drawer, Cont.Example: The Sybase system table sysconfigures, which contains one row for each configuration parameter that can be set by the user.

Attribute Name Datatype Description

config smallint Configuration parameter number.

value int The user-modifiable value for the parameter with integer datatype.Its value is 0 for the parameters with character datatype.

comment varchar(255) Name of the configuration parameter.

status smallint Either 1 (dynamic) or 0 (parameter takes effect when AdaptiveServer is restarted).

name varchar(80) Name of the configuration parameter (the same value as comment).

parent smallint Configuration parameter number of the parent; if more than oneparent, the additional parent numbers are stored in sysattributes.

value2 varchar(255) The user-modified value for the parameter with the characterdatatype. Its value is NULL for parameters with integer datatype. Itis also used to store the pool size of a buffer pool.

value3 int Stores the wash size of a buffer pool.

value4 int Stores the asynchronous prefetch percents of a buffer pool.

value

Page 51: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199851

Junk Drawer, Cont

• Potential advantages:– Flexible.

– Fewer tables.

• Potential disadvantages:– Difficult to maintaining.

– Difficult to understand.

– Subject to error.

– In general, difficult to use.

Page 52: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199852

Computation Table

• Suppose a system must repeatedly compute some formula, which, for example, is a function of one integer variable x.

• The most frequently computed values for x, if such exist, could be pre-computed, stored in a table, an indexed by x.

• For x(x+1) X VALUE

1 2

2 6

3 12

4 20

Page 53: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199853

Computation Table, Cont

• Potential advantages:– Table lookup may be quicker than to compute it from scratch.

• Potential disadvantages:– Space usage.

– Can’t index every value of x.

– Frequently used values of x may change.

Page 54: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199854

Maintaining De-normalized Data

• Normalization eliminates redundancy.

• In many cases de-normalization introduces redundancy.

• When updates to a de-normalized relational scheme occur, any redundant copies, or other dependent values, must also be updated as well.

• Note that this highlights the main trade-off made between normalized and de-normalized data.

Page 55: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199855

Maintaining De-normalized DataExample

PRODUCT-SALES

TOTAL-SALES

SALESPERSON# PRODUCT# PRODUCTSALES032421 32 $635.19

143253 19 $210.15

253462 8 $432.95

SALESPERSON# TOTALSALES032421 $10,231.25

143253 $7,500.62

253462 $17,634.15

Page 56: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199856

Maintaining De-normalized Data, Cont.

• Note that the total amount of revenue for any particular sales person can be obtained from the second table and, in particular, without scanning the first table.

• However, any modification to the PRODUCTSALES field of the first table will require a modification to the TOTALSALES field of the second table.

Page 57: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199857

Techniques For MaintainingDe-normalized Data

• Triggers.

• Application logic using transactions.

• Batch reconciliation.

Page 58: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199858

Maintaining De-normalized DataUsing Triggers

• A trigger is a procedure that defines an action the database server should take when some event occurs.

create trigger total_sales_insert

after insert on PRODUCT-SALES

begin

update TOTAL-SALES

set TOTALSALES = TOTALSALES + :new.PRODUCTSALES

where SALESPERSON# = :new.SALESPERSON#;

end;

Page 59: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199859

Maintaining De-normalized DataUsing Triggers

• The trigger will be executed whenever an insert is performed on the PRODUCT-SALES table.

• Execution of the trigger is transparent to the application.

• Trigger is treated as part of the transaction, e.g., along with the insert into the PRODUCT-SALES table, thereby maintaining integrity.

• The main disadvantage is that the trigger slows down the insert.

Page 60: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199860

Maintaining De-normalized DataUsing Application Logic

• An application can issue multiple queries to maintain consistancy between redundant or derived values.

• To maintain data integrity, all of the queries must be issued as a single transaction.

begin transaction

insert into PRODUCT-SALES values(03245,33,$10.00);

update TOTAL-SALES

set TOTALSALES = TOTALSALES + $10.00

where SALESPERSON# = 03245 and PRODUCT# = 33;

end transaction;

Page 61: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199861

Maintaining De-normalized DataUsing Batch Reconciliation

• If 100% consistency is not required at all times, then a batch job can be run during off-hours, or at appropriate intervals, to bring data into a consistent state.

Page 62: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199862

Notes

• Often times de-normalization techniques are used just to avoid a join.

• Note that some of these techniques appear to simply create a new view. However, a view is simply a pre-canned query.

• De-normalization implies more code and greater chances of error.

Page 63: Copyright, Harris Corporation & Ophir Frieder, 19981 De-Normalization “No major application will run in Third Normal Form.” -Oracle8, The Complete Reference,

Copyright, Harris Corporation &

Ophir Frieder, 199863

Notes, Cont.

• Which de-normalization techniques should be used depends on the application, requirements for performance, reliability, and even on system resources.

• Benchmarking is critical to determine which de-normalization techniques are necessary and which work.

• Usually, the choice between a normalized and de-normalized representation results in a trade-off between maintenance time and performance.