comp 430 intro. to database systems · starflake = tree of junction table & child tables in...
TRANSCRIPT
![Page 1: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/1.jpg)
COMP 430Intro. to Database Systems
Denormalization & Dimensional Modeling
![Page 2: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/2.jpg)
Some consequences of normalization
• Data redundancy is reduced or eliminated.
• Relations are broken into smaller, related tables.
• Using all the attributes from the original relation requires joining these smaller tables.
![Page 3: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/3.jpg)
Denormalization
Deliberately reintroducing some redundancy, so that we can access data faster.
DENORMALIZED DATA AHEAD
![Page 4: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/4.jpg)
Example
Technique:Add duplicate fields
Technique:Add computed fields
![Page 5: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/5.jpg)
Example
Technique:Join tables
![Page 6: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/6.jpg)
Less common denormalization techniques
• Duplicating a commonly-used subset of table fields
• Splitting some table rows into different tables• Frequently- vs. rarely-used data
• Data for different regions
• Common subclasses of data
![Page 7: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/7.jpg)
When to denormalize?
Typically used when some or all of the following apply:• Many queries need to join the data
• Joining the data is expensive – uses scans, rather than indices
• Computing derived data is expensive – complex queries or complex functions
![Page 8: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/8.jpg)
Dealing with redundant data
Still want data consistency, but now it requires work.
Is full consistency required at all times?
Techniques:• Stored procedures act as API for updating DB. They add the redundant data.• Triggers check (and fix?) consistency.• Application code carefully maintains consistency during updates.• Reconcile data as background process.• Reconcile data during system maintenance.
![Page 9: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/9.jpg)
Dimensional modelingIn a tiny, brief nutshell
![Page 10: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/10.jpg)
An alternative to ER modeling
Only a brief overview.
So, we’ll view it through our lens of ER modeling + denormalization.
Emphasizes decision making & use of historical data• Fast retrieval & aggregation of data
• Less concern with updating data & maintaining consistency while updating
![Page 11: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/11.jpg)
DB design often resembles multiple starflakes
Course(crn, …) Student(sid, …, collegeid, …, home_stateid)
Enrollment(crn, sid, …)
College(collegeid, …)
State(stateid, …, country)
Instructor(instr_id, …)
Teach(crn, instr_id, …)
Starflake = tree of junction table & child tables in 1-to-many relationships
Typical:• Few cycles.• Super/sub classes implemented
only with superclass table.
![Page 12: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/12.jpg)
Student(sid, …, college, …, home_state)
Starflakes can be compressed to stars
Course(crn, …)
Enrollment(crn, sid, …)
Instructor(instr_id, …)
Teach(crn, instr_id, …) Denormalize by joining each child’s tree.
Star = junction table & one level of child tables in 1-to-many relationships
![Page 13: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/13.jpg)
Fact & dimension tables
Facts: The junction tables are the most important data – the facts• E.g.: store purchases, class enrollments, click data
• Generally the largest tables
• Key data often numeric & additive – e.g., quantity bought, cost per unit, advertisement views
Dimensions: The child tables in 1-to-many relationship with facts• E.g., stores, customers, sales people, sales period
![Page 14: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/14.jpg)
Dimensional modeling process
• Centered on identifying the business model• Identifying the potential queries
• Identifying the facts – the data used in such queries
• Each query should use only one fact table & its dimension tables.
• Possibly start with an ER model• Identify which junction tables serve as fact tables
• Use only surrogate keys
• Often add time dimension
• Denormalize into starflake or star schema
![Page 15: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/15.jpg)
Orderorder_idcustomer_idstore_idclerk_iddate
Storestore_idstore_nameaddressdistrictfloor_type
Clerkclerk_idclerk_nameclerk_grade
Productskudescriptionbrandcategory
Customercustomer_IDcustomer_namepurchase_profilecredit_profileaddress
Promotionpromotion_idpromotion_nameprice_typead_type
OrderLineorder_idskupromotion_iddollars_soldunits_solddollars_cost
ER model
![Page 16: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/16.jpg)
Timetime_keySQL_dateday_of_weekmonth
Storestore_keystore_idstore_nameaddressdistrictfloor_type
Clerkclerk_keyclerk_idclerk_nameclerk_grade
Productproduct_keyskudescriptionbrandcategory
Customercustomer_keycustomer_namepurchase_profilecredit_profileaddress
Promotionpromotion_keypromotion_nameprice_typead_type
Ordertime_keystore_keyclerk_keyproduct_keycustomer_keypromotion_keydollars_soldunits_solddollars_cost
Dimensional model
![Page 17: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/17.jpg)
View fact table as 𝑛-dimensional data cube
Each dimension table represents a dimension
of the cube.
Facts are the data in the cube.
Facts might be pre-aggregated along each
dimension or combination of dimensions.
![Page 18: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e6927c7f168c937a47c077c/html5/thumbnails/18.jpg)
Sometimes things are still messy
Not all data fit nicely into facts + 1-to-many dimensions.
Leads to exceptions from this simple presentation.