comp 430 intro. to database systems · starflake = tree of junction table & child tables in...

18
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling

Upload: others

Post on 11-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

COMP 430Intro. to Database Systems

Denormalization & Dimensional Modeling

Page 2: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Some consequences of normalization

• Data redundancy is reduced or eliminated.

• Relations are broken into smaller, related tables.

• Using all the attributes from the original relation requires joining these smaller tables.

Page 3: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Denormalization

Deliberately reintroducing some redundancy, so that we can access data faster.

DENORMALIZED DATA AHEAD

Page 4: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Example

Technique:Add duplicate fields

Technique:Add computed fields

Page 5: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Example

Technique:Join tables

Page 6: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Less common denormalization techniques

• Duplicating a commonly-used subset of table fields

• Splitting some table rows into different tables• Frequently- vs. rarely-used data

• Data for different regions

• Common subclasses of data

Page 7: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

When to denormalize?

Typically used when some or all of the following apply:• Many queries need to join the data

• Joining the data is expensive – uses scans, rather than indices

• Computing derived data is expensive – complex queries or complex functions

Page 8: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Dealing with redundant data

Still want data consistency, but now it requires work.

Is full consistency required at all times?

Techniques:• Stored procedures act as API for updating DB. They add the redundant data.• Triggers check (and fix?) consistency.• Application code carefully maintains consistency during updates.• Reconcile data as background process.• Reconcile data during system maintenance.

Page 9: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Dimensional modelingIn a tiny, brief nutshell

Page 10: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

An alternative to ER modeling

Only a brief overview.

So, we’ll view it through our lens of ER modeling + denormalization.

Emphasizes decision making & use of historical data• Fast retrieval & aggregation of data

• Less concern with updating data & maintaining consistency while updating

Page 11: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

DB design often resembles multiple starflakes

Course(crn, …) Student(sid, …, collegeid, …, home_stateid)

Enrollment(crn, sid, …)

College(collegeid, …)

State(stateid, …, country)

Instructor(instr_id, …)

Teach(crn, instr_id, …)

Starflake = tree of junction table & child tables in 1-to-many relationships

Typical:• Few cycles.• Super/sub classes implemented

only with superclass table.

Page 12: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Student(sid, …, college, …, home_state)

Starflakes can be compressed to stars

Course(crn, …)

Enrollment(crn, sid, …)

Instructor(instr_id, …)

Teach(crn, instr_id, …) Denormalize by joining each child’s tree.

Star = junction table & one level of child tables in 1-to-many relationships

Page 13: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Fact & dimension tables

Facts: The junction tables are the most important data – the facts• E.g.: store purchases, class enrollments, click data

• Generally the largest tables

• Key data often numeric & additive – e.g., quantity bought, cost per unit, advertisement views

Dimensions: The child tables in 1-to-many relationship with facts• E.g., stores, customers, sales people, sales period

Page 14: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Dimensional modeling process

• Centered on identifying the business model• Identifying the potential queries

• Identifying the facts – the data used in such queries

• Each query should use only one fact table & its dimension tables.

• Possibly start with an ER model• Identify which junction tables serve as fact tables

• Use only surrogate keys

• Often add time dimension

• Denormalize into starflake or star schema

Page 15: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Orderorder_idcustomer_idstore_idclerk_iddate

Storestore_idstore_nameaddressdistrictfloor_type

Clerkclerk_idclerk_nameclerk_grade

Productskudescriptionbrandcategory

Customercustomer_IDcustomer_namepurchase_profilecredit_profileaddress

Promotionpromotion_idpromotion_nameprice_typead_type

OrderLineorder_idskupromotion_iddollars_soldunits_solddollars_cost

ER model

Page 16: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Timetime_keySQL_dateday_of_weekmonth

Storestore_keystore_idstore_nameaddressdistrictfloor_type

Clerkclerk_keyclerk_idclerk_nameclerk_grade

Productproduct_keyskudescriptionbrandcategory

Customercustomer_keycustomer_namepurchase_profilecredit_profileaddress

Promotionpromotion_keypromotion_nameprice_typead_type

Ordertime_keystore_keyclerk_keyproduct_keycustomer_keypromotion_keydollars_soldunits_solddollars_cost

Dimensional model

Page 17: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

View fact table as 𝑛-dimensional data cube

Each dimension table represents a dimension

of the cube.

Facts are the data in the cube.

Facts might be pre-aggregated along each

dimension or combination of dimensions.

Page 18: COMP 430 Intro. to Database Systems · Starflake = tree of junction table & child tables in 1-to-many relationships Typical: • Few cycles. • Super/sub classes implemented

Sometimes things are still messy

Not all data fit nicely into facts + 1-to-many dimensions.

Leads to exceptions from this simple presentation.