1 r. ching, ph.d. mis california state university, sacramento week 11 november 7 data normalization...
Post on 21-Dec-2015
215 views
TRANSCRIPT
1
R. Ching, Ph.D. • MIS • California State University, Sacramento
Week 11Week 11November 7November 7
• Data Normalization and ERDData Normalization and ERD• Conceptual, Logical and Physical Database DesignConceptual, Logical and Physical Database Design
2
R. Ching, Ph.D. • MIS • California State University, Sacramento
Data NormalizationData Normalization
• The purpose of normalization is to produce a stable set of The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the relations that is a faithful model of the operations of the enterprise.enterprise.
– Achieve a design that is highly flexibleAchieve a design that is highly flexible
– Reduce redundancy Reduce redundancy – Ensure that the design is free of certain update, Ensure that the design is free of certain update,
insertion and deletion anomaliesinsertion and deletion anomalies
Catherine Richardo, Catherine Richardo, 19901990
3
R. Ching, Ph.D. • MIS • California State University, Sacramento
4NF4NF4NF4NF
BCNFBCNFBCNFBCNF
3NF3NF3NF3NF
2NF2NF2NF2NF
NormalizationNormalization
1NF1NF1NF1NF Flat file Flat file
Partial dependencies removed Partial dependencies removed
Transitive dependencies removed Transitive dependencies removed
Every determinant is a candidate key Every determinant is a candidate key
Non-tivial multi-valued dependencies Non-tivial multi-valued dependencies removed removed
4
R. Ching, Ph.D. • MIS • California State University, Sacramento
Stereos To GoInvoice
Order No.
Date: / /
Account No.
ItemNumber Product Description/Manufacturer Qty Price
ProductCode
1
2
3
4
5
Date Shipped: / /
Customer:Address:
City State Zip Code
10001
6 15 99
0000-000-0000-0
John Smith2036-26 StreetSacramento CA 95819
SAGX730 Pioneer Remote A/V Receiver
AT10 Cervwin Vega LoudspeakersCDPC725 Sony Disc-Jockey CD Changer
6 18 99
SubtotalShipping & Handling
Sales TaxTotal
1329851000010306
153291
111
569953599539995
Go, HogsGo, Hogs
1/051/05
Stereos To Go
0000 000 0000 00000 000 0000 0John SmithJohn Smith
5
R. Ching, Ph.D. • MIS • California State University, Sacramento
Unnormalized RelationUnnormalized Relation
How would a program process the data to recreate the invoice?How would a program process the data to recreate the invoice?
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account , Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code,Cust_name Cust_addr Cust_city Cust_state Zip_code,Item1 Item1_descrip Item1_qty Item1_price,Item1 Item1_descrip Item1_qty Item1_price,Item2 Item2_descrip Item2_qty Item2_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , . . . , Item7 Item7_descrip Item7_qty Item7_price)Item7 Item7_descrip Item7_qty Item7_price)
6
R. Ching, Ph.D. • MIS • California State University, Sacramento
Unnormalized to 1NFUnnormalized to 1NF
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account , Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code,Cust_name Cust_addr Cust_city Cust_state Zip_code,Item1, Item1_descrip, Item1_qty, Item1_price,Item1, Item1_descrip, Item1_qty, Item1_price,Item2, Item2_descrip, Item2_qty, Item2_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , . . . , Item7, Item7_descrip, Item7_qty, Item7_price)Item7, Item7_descrip, Item7_qty, Item7_price)
A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record. A flat file places all the data of a transaction into a single record.
This is reminiscent of a COBOL or BASIC program This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.processing a single transaction with one read statement.
Repeating groupsRepeating groups
7
R. Ching, Ph.D. • MIS • California State University, Sacramento
Unnormalized to 1NFUnnormalized to 1NF
Nominated group of attributes Nominated group of attributes to serve as the keyto serve as the key
(form a unique combination)(form a unique combination)
• Eliminate the repeating groups.Eliminate the repeating groups.• Each row retains data for one item.Each row retains data for one item.• If a person bought 5 items, we If a person bought 5 items, we
would have five tupleswould have five tuples
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Item, Item_descrip, Item_qty, Item_price)Item, Item_descrip, Item_qty, Item_price)
8
R. Ching, Ph.D. • MIS • California State University, Sacramento
1NF1NF
10001 123456 John Smith 10001 123456 John Smith ••• SAGX730••• SAGX730 Pioneer Remote A/V Rec Pioneer Remote A/V Rec 11 569.95 569.9510001 123456 John Smith 10001 123456 John Smith ••• SAGX730••• SAGX730 Pioneer Remote A/V Rec Pioneer Remote A/V Rec 11 569.95 569.95
10001 123456 John Smith10001 123456 John Smith ••• ••• AT10 AT10 Cerwin Vega LoudspeakersCerwin Vega Loudspeakers 1 359.951 359.9510001 123456 John Smith10001 123456 John Smith ••• ••• AT10 AT10 Cerwin Vega LoudspeakersCerwin Vega Loudspeakers 1 359.951 359.95
10001 123456 John Smith10001 123456 John Smith ••• ••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95 399.9510001 123456 John Smith10001 123456 John Smith ••• ••• CDPC725 CDPC725 Sony Disc Jockey CD Sony Disc Jockey CD 11 399.95 399.95
10001 123456 John Smith10001 123456 John Smith ••• ••• S/HS/H Shipping Shipping 11 100.00 100.0010001 123456 John Smith10001 123456 John Smith ••• ••• S/HS/H Shipping Shipping 11 100.00 100.00
10001 123456 John Smith10001 123456 John Smith ••• ••• TaxTax Sales Tax Sales Tax 11 103.06 103.0610001 123456 John Smith10001 123456 John Smith ••• ••• TaxTax Sales Tax Sales Tax 11 103.06 103.06
Flat FileFlat File
Invo
ice nu
mber
Invo
ice nu
mber
Accou
nt nu
mber
Accou
nt nu
mber
Custom
er na
me
Custom
er na
me
DescriptionDescriptionItem Item
QuantityQuantityItem Item PricePriceItemItem
9
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NFFrom 1NF
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code,Cust_state, Zip_code,ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)
Functional dependencies and determinantsFunctional dependencies and determinants
Example: item_descrip is functionally dependent on item, Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.such that item is the determinant of item_descript.
10
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NF to 2NFFrom 1NF to 2NF
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_state, Zip_code)
((ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)
Is this unique by itself?Is this unique by itself?What happens if the item is purchased more than once?What happens if the item is purchased more than once?
11
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NF to 2NFFrom 1NF to 2NF
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_state, Zip_code)
((Invoice_number,Invoice_number, ItemItem, Item_descrip, Item_qty, Item_price), Item_descrip, Item_qty, Item_price)
Composite key (forms a unique combination)Composite key (forms a unique combination)
Partial dependencyPartial dependency
12
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 1NF to 2NFFrom 1NF to 2NF
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)
((ItemItem, Item_descrip), Item_descrip)
13
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 2NF to 3NFFrom 2NF to 3NF
Which attributes are dependent on others?Which attributes are dependent on others?Is there a problem?Is there a problem?
((Invoice_numberInvoice_number, Invoice_date, Date_delivered, Cust_account, , Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code)
((Invoice_number,Invoice_number, ItemItem, Item_qty, Item_price), Item_qty, Item_price)
((ItemItem, Item_descrip), Item_descrip)
14
R. Ching, Ph.D. • MIS • California State University, Sacramento
Transitive Dependencies and AnomaliesTransitive Dependencies and Anomalies
• Insertion anomaliesInsertion anomalies
– To add a new row, all customer (name, address, city, To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must state, zip code, phone) and products (description) must be consistent with previous entriesbe consistent with previous entries
• Deletion anomaliesDeletion anomalies
– By deleting a row, a customer or product may cease to By deleting a row, a customer or product may cease to existexist
• Modification anomaliesModification anomalies
– To modify a customer’s or product’s data in one row, To modify a customer’s or product’s data in one row, all modifications must be carried out to all othersall modifications must be carried out to all others
15
R. Ching, Ph.D. • MIS • California State University, Sacramento
Insertion and Modification AnomaliesInsertion and Modification AnomaliesFor example…For example…
DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanasonicPanasonicPV-4250PV-4250 PanasonicPanasonic
DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanasonicPanasonicPV-4250PV-4250 PanasonicPanasonic
CT-32S35CT-32S35 PANPANCT-32S35CT-32S35 PANPAN
InconsistencyInconsistency
DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanaSonicPanaSonicPV-4250PV-4250 Pana SonicPana SonicCT-32S35CT-32S35 PANPAN
DVD-A110DVD-A110 PanasonicPanasonicPV-4210PV-4210 PanaSonicPanaSonicPV-4250PV-4250 Pana SonicPana SonicCT-32S35CT-32S35 PANPAN
Change all Panasonic Change all Panasonic products’ manufacturer products’ manufacturer
name to “Panasonic USA”name to “Panasonic USA”
Product_codeProduct_code Manufacturer_nameManufacturer_nameInsert a new Panasonic productInsert a new Panasonic product
16
R. Ching, Ph.D. • MIS • California State University, Sacramento
Deletion AnomalyDeletion AnomalyFor Example…For Example…
43771824377182 John SmithJohn Smith SacramentoSacramento CACA 958319583143987114398711 Arnold SArnold S DavisDavis CACA 956919569145784614578461 Gray DavisGray Davis SacramentoSacramento CACA 958319583148731794873179 Lisa CarrLisa Carr RenoReno NVNV 8955789557
By deleting customer Arnold S, we would also be deleting By deleting customer Arnold S, we would also be deleting Davis, California. Davis, California.
17
R. Ching, Ph.D. • MIS • California State University, Sacramento
Transitive Transitive DependenciesDependencies
Invoice_numberInvoice_number
Invoice_dateInvoice_date
Date_deliveredDate_delivered
Cust_accountCust_account
Cust_nameCust_name
Cust_addrCust_addr
Cust_cityCust_city
Cust_stateCust_state
Zip_codeZip_code
ItemItem
Item_descripItem_descrip
Invoice_number+ItemInvoice_number+Item
Item_qtyItem_qty
Item_priceItem_price
A condition where A, B, C A condition where A, B, C are attributes of a relation are attributes of a relation such that if A such that if A B and B and B B C, then C is transitively C, then C is transitively dependent on A via B dependent on A via B (provided that A is not (provided that A is not functionally dependent on B functionally dependent on B or C).or C).
18
R. Ching, Ph.D. • MIS • California State University, Sacramento
Why Should City and State Be Separated Why Should City and State Be Separated from Customer Relation?from Customer Relation?
• City and state are dependent on zip code for their values City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key).and not the customer’s identifier (i.e., key).
Zip_code Zip_code City, State City, State
• Otherwise,Otherwise,
Cust_account Cust_account Cust_addr, Zip_code Cust_addr, Zip_code City, State City, State
In which case, you have transitive dependency.In which case, you have transitive dependency.
19
R. Ching, Ph.D. • MIS • California State University, Sacramento
3NF3NF
Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)
Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)
Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)
20
R. Ching, Ph.D. • MIS • California State University, Sacramento
3NF3NF
Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)
Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)
Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)
Since the Items relation contains the manufacturer’s name in Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be the description, a separate Manufacturers relation can be createdcreated
Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)
22
R. Ching, Ph.D. • MIS • California State University, Sacramento
First to Third Normal FormFirst to Third Normal Form(1NF - 3NF)(1NF - 3NF)
• 1NF: 1NF: A relation is in first normal form if and only if every A relation is in first normal form if and only if every attribute is single-valued for each tuple attribute is single-valued for each tuple (remove the (remove the repeating or multi-value attributes and create a flat file)repeating or multi-value attributes and create a flat file)
• 2NF: 2NF: A relation is in second normal form if and only if it A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully is in first normal form and the nonkey attributes are fully functionally dependent on the key functionally dependent on the key (remove partial (remove partial dependencies)dependencies)
• 3NF: 3NF: A relation is in third normal form if it is in second A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively normal form and no nonkey attribute is transitively dependent on the keydependent on the key (remove transitive dependencies) (remove transitive dependencies)
23
R. Ching, Ph.D. • MIS • California State University, Sacramento
Putting It TogetherPutting It Together
ERD of the Normalized Data ModelERD of the Normalized Data Model
24
R. Ching, Ph.D. • MIS • California State University, Sacramento
3NF3NF
Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)
Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)
Items RelationItems Relation(Item, Item_descrip, Manuf_code)(Item, Item_descrip, Manuf_code)
Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)
25
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Invoice_items
Invoice_numberItemItem_qtyItem_price
Items
ItemItem_descripManuf_code
Manufacturers
Manuf_codeManuf_name
26
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
(1..1)(1..1)
(1..*)(1..*)
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
27
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
(0..*)(0..*) (1..1)(1..1) ((00..*)..*) (1..1)(1..1)
(1..1)(1..1)
(1..*)(1..*)
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
Zip codes locate Customers.Zip codes locate Customers.(() ) A zip code can be related to a A zip code can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many customers.many customers.(() ) A customer can be related to a A customer can be related to a minimum and maximum of one zip code.minimum and maximum of one zip code.
Zip codes locate Customers.Zip codes locate Customers.(() ) A zip code can be related to a A zip code can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many customers.many customers.(() ) A customer can be related to a A customer can be related to a minimum and maximum of one zip code.minimum and maximum of one zip code.
PartialPartial
28
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
((00..*)..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
(1..1)(1..1)
(1..*)(1..*)
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
Customers order (items) on invoices.Customers order (items) on invoices.(() ) A customer can be related to a A customer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoices.many invoices.(() ) An invoice can be related to a An invoice can be related to a minimum and maximum of one minimum and maximum of one customer.customer.
Customers order (items) on invoices.Customers order (items) on invoices.(() ) A customer can be related to a A customer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoices.many invoices.(() ) An invoice can be related to a An invoice can be related to a minimum and maximum of one minimum and maximum of one customer.customer.
PartialPartial
29
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
(1..1)(1..1)
((11..*)..*)
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
Invoices possess invoice itemsInvoices possess invoice items(() ) An invoice can be related to a An invoice can be related to a minimum of one and a maximum of minimum of one and a maximum of many invoice items.many invoice items.(()) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one invoice.minimum and maximum of one invoice.
Invoices possess invoice itemsInvoices possess invoice items(() ) An invoice can be related to a An invoice can be related to a minimum of one and a maximum of minimum of one and a maximum of many invoice items.many invoice items.(()) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one invoice.minimum and maximum of one invoice.
MandatoryMandatory
30
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
(1..1)(1..1)
(1..*)(1..*)
((00..*)..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
Items are sold on invoice items.Items are sold on invoice items.(() ) An item can be related to a An item can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoice items.many invoice items.(() ) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one item.minimum and maximum of one item.
Items are sold on invoice items.Items are sold on invoice items.(() ) An item can be related to a An item can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many invoice items.many invoice items.(() ) An invoice item can be related to a An invoice item can be related to a minimum and maximum of one item.minimum and maximum of one item.
PartialPartial
31
R. Ching, Ph.D. • MIS • California State University, Sacramento
ERDERD
Invoices Customers Zip_Codes
Invoice_numberInvoice_dateDate_deliveredCust_account
Cust_accountCust_nameCust_addrZip_code
Zip_codeCityState
Locate
Invoice_items
Invoice_numberItemItem_qtyItem_price
Have
Items
ItemItem_descripManuf_code
Appear onManufacturers
Manuf_codeManuf_name
Produce
Order
(0..*)(0..*) (1..1)(1..1) (0..*)(0..*) (1..1)(1..1)
(1..1)(1..1)
(1..*)(1..*)
(0..*)(0..*) (1..1)(1..1) ((00..*)..*) (1..1)(1..1)
Manufacturers produce items.Manufacturers produce items.(() ) A manufacturer can be related to a A manufacturer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many items.many items.(() ) An item can be related to a An item can be related to a minimum and maximum of one minimum and maximum of one manufacturer.manufacturer.
Manufacturers produce items.Manufacturers produce items.(() ) A manufacturer can be related to a A manufacturer can be related to a minimum of zero and a maximum of minimum of zero and a maximum of many items.many items.(() ) An item can be related to a An item can be related to a minimum and maximum of one minimum and maximum of one manufacturer.manufacturer.
PartialPartial
32
R. Ching, Ph.D. • MIS • California State University, Sacramento
Higher Forms of Data NormalizationHigher Forms of Data Normalization
• Boyce-Codd Normal Form (BCNF)Boyce-Codd Normal Form (BCNF)• Fourth Normal Form (4NF)Fourth Normal Form (4NF)• Fifth Normal Form (5NF)Fifth Normal Form (5NF)• Domain Key Normal Form (DKNF)Domain Key Normal Form (DKNF)
33
R. Ching, Ph.D. • MIS • California State University, Sacramento
Boyce-Codd Normal Form (BCNF)Boyce-Codd Normal Form (BCNF)
• A relation is in Boyce-Codd normal form if and only if A relation is in Boyce-Codd normal form if and only if every determinant is a candidate keyevery determinant is a candidate key
• For a relation with only one candidate key, 3NF and BCNF For a relation with only one candidate key, 3NF and BCNF are are equivalentequivalent..
• Usually occurs when keys of different relations overlapUsually occurs when keys of different relations overlap
DeterminantDeterminant
(B is functionally dependent on A)(B is functionally dependent on A)
Attribute AAttribute AAttribute AAttribute A Attribute BAttribute BAttribute BAttribute BA determines BA determines B
34
R. Ching, Ph.D. • MIS • California State University, Sacramento
BCNF ExampleBCNF Example
User (User (UserID, DeptUserID, Dept, Name, ComputerID, EmpClassification), Name, ComputerID, EmpClassification)User (User (UserID, DeptUserID, Dept, Name, ComputerID, EmpClassification), Name, ComputerID, EmpClassification)
UserComputer (UserComputer (ComputerIDComputerID, Dept), Dept)
User (User (UserID, ComputerIDUserID, ComputerID, Name, EmpClassification), Name, EmpClassification)
UserComputer (UserComputer (ComputerIDComputerID, Dept), Dept)
User (User (UserID, ComputerIDUserID, ComputerID, Name, EmpClassification), Name, EmpClassification)
ComputerID ComputerID Dept ( Dept (a department issues a computera department issues a computer))UserID, Dept UserID, Dept ComputerID, Name, EmpCassification ComputerID, Name, EmpCassification
((Employees may have the same name and Employees may have the same name and UserIDs are unique within the department onlyUserIDs are unique within the department only))
UserID, ComputerID UserID, ComputerID Dept, Name, EmpClassification Dept, Name, EmpClassification
BCNFBCNF
35
R. Ching, Ph.D. • MIS • California State University, Sacramento
From 3NF to BCNFFrom 3NF to BCNF
Invoice RelationInvoice Relation(Invoice_number, Invoice_date, Date_delivered, Cust_account)(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer RelationCustomer Relation(Cust_account, Cust_name, Cust_addr, Zip_code)(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code RelationZip_code Relation(Zip_code, City, State)(Zip_code, City, State)
Invoice_items RelationInvoice_items Relation(Invoice_number, Item, Item_qty, Item_price)(Invoice_number, Item, Item_qty, Item_price)
Items RelationItems Relation(Item, Item_descrip)(Item, Item_descrip)
Manufacturers RelationManufacturers Relation(Manuf_code, Manuf_name)(Manuf_code, Manuf_name)
Candidate keys?Candidate keys?
36
R. Ching, Ph.D. • MIS • California State University, Sacramento
Fourth Normal Form (4NF)Fourth Normal Form (4NF)
• A relation is in fourth normal form if and only if it is in A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial Boyce-Codd normal form and there are no nontrivial dependencies.dependencies.
– Identify all determinants and make sure they are Identify all determinants and make sure they are candidate keyscandidate keys
37
R. Ching, Ph.D. • MIS • California State University, Sacramento
Employee (Employee (EmployeeID, DeptEmployeeID, Dept))
Projects (Projects (EmployeeID, ProjectEmployeeID, Project))
Employee (Employee (EmployeeID, DeptEmployeeID, Dept))
Projects (Projects (EmployeeID, ProjectEmployeeID, Project))
4NF4NF4NF4NF
4NF Example4NF Example
Employee (Employee (EmployeeID, Dept, ProjectEmployeeID, Dept, Project))Employee (Employee (EmployeeID, Dept, ProjectEmployeeID, Dept, Project))
Multivalued Multivalued dependenciesdependencies
100100 FinanceFinance F177-99F177-99100100 MarketingMarketing F177-99F177-99100100 FinanceFinance F288-00F288-00102102 FinanceFinance F288-00F288-00102102 MarketingMarketing F177-99F177-99102102 FinanceFinance F177-99F177-99
Matrix Matrix managementmanagement
38
R. Ching, Ph.D. • MIS • California State University, Sacramento
4NF4NF4NF4NF
4NF Example4NF Example
Multivalued Multivalued dependenciesdependencies
100100 FinanceFinance F177-99F177-99100100 MarketingMarketing F177-99F177-99100100 FinanceFinance F288-00F288-00102102 FinanceFinance F288-00F288-00102102 MarketingMarketing F177-99F177-99102102 FinanceFinance F177-99F177-99
Matrix Matrix managementmanagement
100100 FinanceFinance100100 MarketingMarketing102102 FinanceFinance102102 MarketingMarketing
100100 FinanceFinance100100 MarketingMarketing102102 FinanceFinance102102 MarketingMarketing
100100 F177-99F177-99100100 F288-00F288-00102102 F288-00F288-00102102 F177-99F177-99
100100 F177-99F177-99100100 F288-00F288-00102102 F288-00F288-00102102 F177-99F177-99
39
R. Ching, Ph.D. • MIS • California State University, Sacramento
Fifth Normal Form (5NF)Fifth Normal Form (5NF)aka Project-Join NFaka Project-Join NF
• A relation is in A relation is in fifth normal form fifth normal form if no remaining nonloss if no remaining nonloss projections (i.e., all projects preserve all information projections (i.e., all projects preserve all information contained in the original relation)are possible, except the contained in the original relation)are possible, except the trivial one in which the key appears in each project.trivial one in which the key appears in each project.
– The join of all projects will result in the original The join of all projects will result in the original relationrelation
– No systematic method exists for obtaining 5NF or for No systematic method exists for obtaining 5NF or for ensuring that a set of relations is indeed 5NFensuring that a set of relations is indeed 5NF
Ricardo, 1990 Ricardo, 1990
40
R. Ching, Ph.D. • MIS • California State University, Sacramento
Domain-Key Normal Form (DKNF)Domain-Key Normal Form (DKNF)
• A relation is in A relation is in domain-key normal form domain-key normal form if every constraint if every constraint is a logical consequence of domain constraints or key is a logical consequence of domain constraints or key constraints (i.e., all possible values are a result of an constraints (i.e., all possible values are a result of an imposed constraint)imposed constraint)– There is no proven method of converting a design to There is no proven method of converting a design to
DKNF, so it remains an ideal rather than a state that DKNF, so it remains an ideal rather than a state that can readily be achievedcan readily be achieved
Ricardo, 1990 Ricardo, 1990
41
R. Ching, Ph.D. • MIS • California State University, Sacramento
DKNFDKNF
Emp_ID, Emp_name, Classification, Position, SalaryEmp_ID, Emp_name, Classification, Position, Salary
For example:For example:
Domain for Classification:Domain for Classification:• ExecutiveExecutive• ManagerManager• StaffStaff
Domain for Classification:Domain for Classification:• ExecutiveExecutive• ManagerManager• StaffStaff
Domain for PositionDomain for Position• Strategic PlannerStrategic Planner• CIOCIO• Vice PresidentVice President
Domain for PositionDomain for Position• Strategic PlannerStrategic Planner• CIOCIO• Vice PresidentVice President
Domain for PositionDomain for Position• Programmer/Analyst IProgrammer/Analyst I• Programmer/Analyst IIProgrammer/Analyst II• Database/Analyst IDatabase/Analyst I
Domain for PositionDomain for Position• Programmer/Analyst IProgrammer/Analyst I• Programmer/Analyst IIProgrammer/Analyst II• Database/Analyst IDatabase/Analyst I
42
R. Ching, Ph.D. • MIS • California State University, Sacramento
Database Design MethodologyDatabase Design Methodology
Physical database designPhysical database design
Logical database designLogical database design
Conceptual database designConceptual database design • Build conceptual representation of the Build conceptual representation of the databasedatabase
• Translate conceptual representation to Translate conceptual representation to logical structure of the databaselogical structure of the database
• Operatioanlize logical structure in a Operatioanlize logical structure in a physical implementationphysical implementation
43
R. Ching, Ph.D. • MIS • California State University, Sacramento
Conceptual Database DesignConceptual Database Design
• The process of constructing a model of the data used in an The process of constructing a model of the data used in an enterprise, independent of all physical considerationsenterprise, independent of all physical considerations
• What’s involved…What’s involved…– Identify entity types, relationship typesIdentify entity types, relationship types– Identify and associate attributes with entity or relationship typesIdentify and associate attributes with entity or relationship types– Determine attribute domainsDetermine attribute domains– Determine candidate, primary and alternate key attributesDetermine candidate, primary and alternate key attributes– Consider use of enhanced modeling conceptsConsider use of enhanced modeling concepts– Check model for redundanciesCheck model for redundancies– Validate conceptual model against user transactionsValidate conceptual model against user transactions– Review conceptual data model with the usersReview conceptual data model with the users
44
R. Ching, Ph.D. • MIS • California State University, Sacramento
Logical Database DesignLogical Database Design
• The process of constructing a model of the data used in an The process of constructing a model of the data used in an enterprise based on a specific data model, but independent enterprise based on a specific data model, but independent of a particular DBMS and other physical considerationsof a particular DBMS and other physical considerations
• What’s involved…What’s involved…– Derive relations for logical data modelDerive relations for logical data model– Validate relations using Validate relations using data normalizationdata normalization– Validate relations against user transactionsValidate relations against user transactions– Check integrity constraintsCheck integrity constraints– Review logical data model (ERD) with the usersReview logical data model (ERD) with the users– Merge logical data models into global data modelMerge logical data models into global data model– Check for future growthCheck for future growth
45
R. Ching, Ph.D. • MIS • California State University, Sacramento
Gather InformationGather Information
• Meet with the users to get gather informationMeet with the users to get gather information
– InterviewsInterviews
– DocumentsDocuments
46
R. Ching, Ph.D. • MIS • California State University, Sacramento
Derive Relations Derive Relations
• Strong and weak entity typesStrong and weak entity types• Relationship types (cardinality)Relationship types (cardinality)• Participation (mandatory vs. partial)Participation (mandatory vs. partial)
Invoice ItemsInvoice Items
Invoice number (pk)Invoice number (pk)Product code (pk)Product code (pk)Manufacture codeManufacture codeQuantityQuantitySales PriceSales Price
InvoiceInvoice
Invoice number (pk)Invoice number (pk)Invoice dateInvoice dateDelivery dateDelivery dateSales typeSales typeCustomer accountCustomer account
1..*1..* 1..11..1
HaveHave
Weak entity type Weak entity type (Invoice number is (Invoice number is part of key)part of key)
Strong entity typeStrong entity type
Mandatory (all Mandatory (all invoices must have invoices must have at least one invoice at least one invoice itemitem
One-to-many relationshipOne-to-many relationship Invoices have invoice itemsInvoices have invoice items
47
R. Ching, Ph.D. • MIS • California State University, Sacramento
Validate RelationsValidate Relations
• Normalize relationsNormalize relations
• Validate against transactions - Validate against transactions - Can a transaction be Can a transaction be recreated given the data retained in the relations?recreated given the data retained in the relations?))
• Check integrity constraintsCheck integrity constraints
– Required data (not null)Required data (not null)
– Domain constraints (in, references)Domain constraints (in, references)
– MultiplicityMultiplicity
– Entity integrity (primary key)Entity integrity (primary key)
– Referential integrity (foreign key)Referential integrity (foreign key)
– General constraints (business rules)General constraints (business rules)
48
R. Ching, Ph.D. • MIS • California State University, Sacramento
Review Data Model with the UsersReview Data Model with the Users
• Be pleasant and professional, not arrogant, challenging or Be pleasant and professional, not arrogant, challenging or condescendingcondescending
– Not everyone is receptive to change Not everyone is receptive to change
– Your role is to Your role is to facilitatefacilitate change change
• The user is always “right” – The user is always “right” – It’s his/her dataIt’s his/her data
• Document all change requests (Document all change requests (CYACYA))
• Listen, listen, listen… (Listen, listen, listen… (Even if you don’t agreeEven if you don’t agree))
49
R. Ching, Ph.D. • MIS • California State University, Sacramento
Logical Global Data ModelLogical Global Data Model
InvoiceInvoice
InventoryInventory
Cust BillingCust Billing
Vendor HistoryVendor History
Product SalesProduct Sales
Records TransactionsRecords Transactions
Counts and retail pricesCounts and retail prices
Customer credit salesCustomer credit sales
Vendor performanceVendor performance
Sales historySales history
Cust AccountsCust Accounts
Customer credit accountsCustomer credit accounts
Local Local Data Data
ModelsModels
Global Data Global Data ModelModel
Global Data Global Data ModelModel
Local data models are merged Local data models are merged to create a (near) normalized to create a (near) normalized global data model global data model
50
R. Ching, Ph.D. • MIS • California State University, Sacramento
Physical Database DesignPhysical Database Design
• The process of producing a description of the The process of producing a description of the implementation of the database on secondary storageimplementation of the database on secondary storage
• It describes the base relations, fle organizations and It describes the base relations, fle organizations and indexes used to achieve efficient access to the data and nay indexes used to achieve efficient access to the data and nay associated integrity constraints and security measuresassociated integrity constraints and security measures
• What’s involved…What’s involved…– Translate logical data model for target DBMS: Design base Translate logical data model for target DBMS: Design base
relations, representation of derived data and general constraintsrelations, representation of derived data and general constraints– Design file organizations and indexes: Analyze transactions, Design file organizations and indexes: Analyze transactions,
choose file organizations, choose indexes, choose file organizations, choose indexes, estimate disk space requirementsestimate disk space requirements
– Design user views and security mechanismsDesign user views and security mechanisms– Consider the introduction of controlled redundancyConsider the introduction of controlled redundancy– Monitor and tune the operational systemMonitor and tune the operational system
Dictated by the DB productDictated by the DB product
51
R. Ching, Ph.D. • MIS • California State University, Sacramento
Logical vs. Physical Database DesignLogical vs. Physical Database Design
• LogicalLogicalThe process of constructing a model of the information use The process of constructing a model of the information use the enterprise based on one model of data, BUT the enterprise based on one model of data, BUT independent of a particular DBMS and other physical independent of a particular DBMS and other physical aspects.aspects.
• PhysicalPhysicalThe process of producing a description of the The process of producing a description of the implementation of the database on secondary storage; it implementation of the database on secondary storage; it describes the storage structures and access methods used to describes the storage structures and access methods used to gain access effectively.gain access effectively.
Whereas the logical database design is concerned with the Whereas the logical database design is concerned with the whatwhat, physical database design is concerned with the , physical database design is concerned with the howhow..
52
R. Ching, Ph.D. • MIS • California State University, Sacramento
Physical Database DesignPhysical Database Design
Five steps:Five steps:
• Translate the global (enterprise) logical data model for the Translate the global (enterprise) logical data model for the target DBMStarget DBMS
• Design files organizations and indexes, estimate database Design files organizations and indexes, estimate database space (disk space requirements)space (disk space requirements)
• Design and implement user views and security Design and implement user views and security mechanismsmechanisms
• Consider the introduction of controlled redundancy Consider the introduction of controlled redundancy ((denormalizationdenormalization))
• Monitor and tune the operational systemMonitor and tune the operational system
53
R. Ching, Ph.D. • MIS • California State University, Sacramento
Translate the Global Logical Database Model Translate the Global Logical Database Model for the Target DBMSfor the Target DBMS
• Design the relations for the target DBMSDesign the relations for the target DBMS
– Decide how to represent the base relations in the global Decide how to represent the base relations in the global logical data model in the target DBMSlogical data model in the target DBMS
• Specify keys (primary, foreign), default values, Specify keys (primary, foreign), default values, integrity constraints (table, column), and indexesintegrity constraints (table, column), and indexes
• Design integrity rules for the target DBMSDesign integrity rules for the target DBMS
– Design the enterprise constraints for the target DBMSDesign the enterprise constraints for the target DBMS
• Applies to updates and insertsApplies to updates and inserts
54
R. Ching, Ph.D. • MIS • California State University, Sacramento
Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation
• Determine the file organizations and access methods that Determine the file organizations and access methods that will be used to store the base relations (i.e., the way in will be used to store the base relations (i.e., the way in which relations and tuples will be held in secondary storage)which relations and tuples will be held in secondary storage)
– Understand the system resourcesUnderstand the system resources• Understand the capabilities of the hardware (CPU, Understand the capabilities of the hardware (CPU,
memory, disk I-O)memory, disk I-O)• Analyze the software’s performance and limitations Analyze the software’s performance and limitations
on the network (client/server) and Interneton the network (client/server) and Internet
Depends on the vendor!Depends on the vendor!
55
R. Ching, Ph.D. • MIS • California State University, Sacramento
Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation
• Analyze the transactions - understand the functionality of the Analyze the transactions - understand the functionality of the transactions that will run on the database, and analyze the transactions that will run on the database, and analyze the import transactionsimport transactions
• Choose file organizationChoose file organization
• Choose secondary indexes - determine whether secondary Choose secondary indexes - determine whether secondary indexes will enhance performanceindexes will enhance performance
– Index the primary key (if it is not the key of the file Index the primary key (if it is not the key of the file organization)organization)
– Do not index small relationsDo not index small relations
– Add a secondary index to a heavily used secondary keyAdd a secondary index to a heavily used secondary key
– Add a secondary index to a frequently used foreign keyAdd a secondary index to a frequently used foreign key
56
R. Ching, Ph.D. • MIS • California State University, Sacramento
Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation
((cont.cont.))
– AVOID AVOID INDEXING AN ATTRIBUTE OR INDEXING AN ATTRIBUTE OR RELATION THAT IS FREQUENTLY UPDATEDRELATION THAT IS FREQUENTLY UPDATED
– Avoid indexing an attribute if the query will retrieve a Avoid indexing an attribute if the query will retrieve a large portion of the tuples in a relationlarge portion of the tuples in a relation
– Avoid indexing attributes that consist of long character Avoid indexing attributes that consist of long character stringsstrings
57
R. Ching, Ph.D. • MIS • California State University, Sacramento
Design and Implement the Physical Design and Implement the Physical RepresentationRepresentation
• Consider the introduction of controlled redundancyConsider the introduction of controlled redundancy
– Determine whether introducing redundancy in a Determine whether introducing redundancy in a controlled manner by relaxing the normalization rules controlled manner by relaxing the normalization rules will enhance performancewill enhance performance
• DenormalizeDenormalize only when only when necessarynecessary
– However, denormalizingHowever, denormalizing
» Makes implementation more complexMakes implementation more complex
» Sacrifices flexibilitySacrifices flexibility
» May slow down updates (although retrievals May slow down updates (although retrievals may be increased)may be increased)
58
R. Ching, Ph.D. • MIS • California State University, Sacramento
3NF (Logical Database Design)3NF (Logical Database Design)
Invoice_numberInvoice_number Invoice_date Date_delivered Cust_account Invoice_date Date_delivered Cust_account
Cust_accountCust_account Cust_name Cust_addr Zip_code Cust_name Cust_addr Zip_code
Zip_codeZip_code City State City State
Invoice_numberInvoice_number ItemItem Item_qty Item_price Item_qty Item_price
Invoice RelationInvoice Relation
Customer RelationCustomer Relation
Zip_code RelationZip_code Relation
Invoice_items RelationInvoice_items Relation
ItemItem Item_descrip Manuf_code Item_descrip Manuf_code
Items RelationItems RelationManuf_codeManuf_code Manuf_name Manuf_nameManufacturers RelationManufacturers Relation
59
R. Ching, Ph.D. • MIS • California State University, Sacramento
Cust_account Cust_name Cust_addr Zip_code Account_type Cust_account Cust_name Cust_addr Zip_code Account_type
Credit_limit Current_balance Pay_historyCredit_limit Current_balance Pay_history
DenormalizationDenormalization
Cust_account Cust_name Cust_addr Zip_codeCust_account Cust_name Cust_addr Zip_code
Customers RelationCustomers Relation
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_historyCustomer_accounts RelationCustomer_accounts Relation
• Duplicating attributes or combining relationsDuplicating attributes or combining relations– Combining 1:1 relationshipsCombining 1:1 relationships
Customers RelationCustomers Relation
60
R. Ching, Ph.D. • MIS • California State University, Sacramento
Cust_account Cust_name Cust_addr Cust_account Cust_name Cust_addr City StateCity State Zip_code Zip_code
Account_type Credit_limit Current_balance Pay_historyAccount_type Credit_limit Current_balance Pay_history
DenomalizationDenomalization
• Duplicating attributes or combining relationsDuplicating attributes or combining relations– Duplicating nonkey attributes in 1:M relationships to Duplicating nonkey attributes in 1:M relationships to
reduce joins (creating partial or transitive reduce joins (creating partial or transitive dependencies)dependencies)
Zip_code City StateZip_code City State
Cust_account Cust_name Cust_addr Zip_code Account_type Cust_account Cust_name Cust_addr Zip_code Account_type
Credit_limit Current_balance Pay_historyCredit_limit Current_balance Pay_history
Customers RelationCustomers Relation
Zip_codes RelationZip_codes Relation
Customers RelationCustomers Relation
61
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
– Reference tables (introducing transitive dependencies)Reference tables (introducing transitive dependencies)
Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price
Invoice_items RelationInvoice_items Relation
Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name
Manufacturers RelationManufacturers Relation
Problem:Problem: In order to know the manufacturer’s name of a In order to know the manufacturer’s name of a customer’s purchased item, a join between Items and customer’s purchased item, a join between Items and Manufacturers Manufacturers mustmust be performed be performed
Items RelationItems Relation
62
R. Ching, Ph.D. • MIS • California State University, Sacramento
Invoice_number Item Invoice_number Item Manuf_codeManuf_code Manuf_nameManuf_name Item_qty Item_qty
Item_priceItem_price
DenomalizationDenomalization
((cont.cont.))
– Reference tables (introducing transitive dependencies)Reference tables (introducing transitive dependencies)
Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price
Invoice_items RelationInvoice_items Relation
Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name
Manufacturers RelationManufacturers RelationItems RelationItems Relation
63
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
– Duplicating foreign key attributes in 1:M relationships to Duplicating foreign key attributes in 1:M relationships to reduce joinsreduce joins
Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price
Invoice_items RelationInvoice_items Relation
Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name
Manufacturers RelationManufacturers RelationItems RelationItems Relation
Problem:Problem: To find the manufacturer’s name of a product (e.g., To find the manufacturer’s name of a product (e.g., Sony CDP-525) from line_items (relation), two joins must be Sony CDP-525) from line_items (relation), two joins must be made: manufacturers to products, and products to manufacturers.made: manufacturers to products, and products to manufacturers.
64
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
Invoice_number Item Invoice_number Item Manuf_codeManuf_code Item_qty Item_price Item_qty Item_price
Invoice_number Item Item_qty Item_priceInvoice_number Item Item_qty Item_price
Invoice_items RelationInvoice_items Relation
Item Item_descrip Manuf_codeItem Item_descrip Manuf_code Manuf_code Manuf_nameManuf_code Manuf_name
Manufacturers RelationManufacturers RelationItems RelationItems Relation
65
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))– Duplicating attributes in M:N relationships to reduce Duplicating attributes in M:N relationships to reduce
joinsjoins
If joint accounts are allowed and different types of accounts If joint accounts are allowed and different types of accounts (i.e., long term, revolving) are available:(i.e., long term, revolving) are available:
Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num
Customers RelationCustomers Relation
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_NumSoc_Sec_Num
M:NM:NM:NM:NCustomer_accounts RelationCustomer_accounts Relation
66
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenormalizationDenormalization
123456789123456789 John SmithJohn Smith … … 123-45-6789123-45-6789
123456789 123456789 Jane SmithJane Smith ...... 987-65-4321987-65-4321
112233445112233445 John DoeJohn Doe …… 567-32-1234567-32-1234
123456789123456789 …… 123-45-6789123-45-6789
123456789123456789 …… 987-65-7321987-65-7321
543219876543219876 …… 123-45-6789123-45-6789
678901234678901234 …… 987-65-7321987-65-7321
548794133548794133 …… 567-32-1234567-32-1234
A customer can have A customer can have several accounts...several accounts...
An account can have An account can have several owners...several owners...
67
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenormalizationDenormalization
((cont.cont.))
Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num
Customers RelationCustomers Relation
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_NumSoc_Sec_Num
M:NM:NM:NM:NCustomer_accounts RelationCustomer_accounts Relation
68
R. Ching, Ph.D. • MIS • California State University, Sacramento
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num Soc_Sec_Num Cust_nameCust_name
DenormalizationDenormalization
Cust_account Cust_name Cust_addr Zip_code Soc_Sec_NumCust_account Cust_name Cust_addr Zip_code Soc_Sec_Num
Customers RelationCustomers Relation
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_NumSoc_Sec_Num
Customer_accounts RelationCustomer_accounts Relation
((cont.cont.))• Duplicating attributes in M:N relationships to reduce joinsDuplicating attributes in M:N relationships to reduce joins
69
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
– Introducing repeating groups (Introducing repeating groups (if the number of if the number of occurrences is known and/or constantoccurrences is known and/or constant))
– Creating extract tables (in an extreme case, an Creating extract tables (in an extreme case, an unnormalized relation) - frees computing resourcesunnormalized relation) - frees computing resources
Cust_account Account_type Credit_limit Current_balance Pay_historyCust_account Account_type Credit_limit Current_balance Pay_history
Soc_Sec_Num1 Cust_name1 Soc_Sec_Num1 Cust_name1 Soc_Sec_num2 Cust_name2Soc_Sec_num2 Cust_name2
70
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
– Introduction of “codes” toIntroduction of “codes” to
• Simplify the composite keySimplify the composite key
• Retain the original sequence Retain the original sequence
Invoice_items RelationInvoice_items Relation
Invoice_number Item_numberInvoice_number Item_number Item Manuf_code Item_qty Item_price Item Manuf_code Item_qty Item_price
Invoice_number ItemInvoice_number Item Manuf_code Item_qty Item_price Manuf_code Item_qty Item_price
71
R. Ching, Ph.D. • MIS • California State University, Sacramento
Stereos To GoOrder No.
Date: / /
Account No.
Item
Number Product Description/Manufacturer Qty PriceProduct
Code
1
2
3
4
5
Date Shipped: / /
Customer:Address:
City State Zip Code
10001
6 15 93
1005-25941-2
William Tell2036-26 StreetSacramento CA 95819
SAGX730 Pioneer Remote A/V Receiver 1 56995AT10 Cervwin Vega Loudspeakers 1 35995CDPC725 Sony Disc-Jockey CD Changer 1 39995
6 18 93
Invoice No.
72
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenormalizationDenormalization
1000110001 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.951000110001 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.951000110001 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95 569.95
Invo
ice_
num
ber
Invo
ice_
num
ber
Item
Item
Man
uf_c
ode
Man
uf_c
ode
Item
_des
crip
tion
Item
_des
crip
tion
QtyQty
Retai
l_pr
ice
Retai
l_pr
ice
KeyKey
Problem:Problem: These items are not in the sequence as they appear on These items are not in the sequence as they appear on the original document when retrieved from the table.the original document when retrieved from the table.
73
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenormalizationDenormalization
1000110001 0101 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95 569.95 1000110001 02 02 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.951000110001 03 03 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.95
Invo
ice_
num
ber
Invo
ice_
num
ber
Item
_num
ber
Item
_num
ber
Item
Item
Man
uf_c
ode
Man
uf_c
ode
Item
_des
crip
tion
Item
_des
crip
tion
QtyQty
Retai
l_pr
ice
Retai
l_pr
ice
KeyKey
74
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenomalizationDenomalization
((cont.cont.))
– Introducing calculated attributesIntroducing calculated attributes
• Simplify processingSimplify processing
Invoice_items RelationInvoice_items Relation
Invoice_number Item_numberInvoice_number Item_number Item Manuf_code Item_qty Item_price Item Manuf_code Item_qty Item_price
Invoice_number ItemInvoice_number Item Manuf_code Item_qty Item_price Manuf_code Item_qty Item_price
Extended_priceExtended_priceItem_qty Item_qty xx Item_price Item_price
75
R. Ching, Ph.D. • MIS • California State University, Sacramento
DenormalizationDenormalization
1000110001 0101 SAGX730 SAGX730 PIOPIO Remote A/V Receiver Remote A/V Receiver 1 1 569.95569.95 569.95 569.95 1000110001 02 02 AT10 AT10 CV CV Loudspeakers Loudspeakers 2 2 359.95359.95 719.90719.901000110001 03 03 CDPC725 CDPC725 SONSON Disc-Jockey CD Changer Disc-Jockey CD Changer 1 1 399.95399.95 399.95399.95
Invo
ice_
num
ber
Invo
ice_
num
ber
Item
_num
ber
Item
_num
ber
Item
Item
Man
uf_c
ode
Man
uf_c
ode
Item
_des
crip
tion
Item
_des
crip
tion
QtyQty Ret
ail_
pric
e
Retai
l_pr
ice
Exten
ded_
pric
e
Exten
ded_
pric
e
CalculationCalculation