l15.pptx

Post on 13-Jul-2016

4 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Case Study: Education

Student Registration, Attendance, & Performance

Requirements

• Analyze student registration • Analyze student attendance• Relate attendance to performance• Analyze facility utilization

April 28, 2023 2SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Scenario

• Many Disciplines• Many Departments• First degree & Higher degree• Multiple campuses• 4000 on campus students (Pilani)• 350 courses offered each semester• Each student doing 6 courses/sem • 40 lectures per course• 5 years data

April 28, 2023 3SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Analysis Requirements• Top/Bottom 5 electives • Correlation between attendance and performance• Variation in MGPA in courses• Variation in CGPA of students discipline wise/campus wise• Average CGPA/MGPA over the last few semesters at different

campuse• Attendance/performance in CDCs of first & second

disciplines• Attendance in forenoon/afternoon sessions• Attendance for UG/PG students• Most popular discipline as choice for dual at different

campuses• Performance of dualites vs. single degree students

April 28, 2023 4SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Fig1

April 28, 2023 5SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Student Registration Event

• Grain of the FT would be one row for each registered course by student & semester

• Semester is the lowest level available for the registration events

• Semester dimension should conform to the calendar date dimension

• Student dimension should have demographic data + on campus information like part-time, full-time, involvement in athletics, major, UG/PG,

April 28, 2023 6SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Registration Event as a factless fact table

April 28, 2023 7SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Registration Events

• How many students have registered for a given faculty member’s course during the last five years ?How many have registered for more than a course from a given faculty member ?

• When counting the number of registrations for a faculty member, any key can be used as the argument to the count function e.g. SELCET FACULTY, COUNT (TERM_KEY) … GROUP BY FACULTY

It gives simple count of the number of student registrations by faculty, subject to any constraints that may exist in WHERE clause

April 28, 2023 8SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Registration Events

• If we can track the tuition revenue, earned credit hours and grade scores, we could add them to our fact table, however, our FT is no longer a factless fact table

• Revenue generated by course or faculty• Avg. grade per class by faculty

April 28, 2023 9SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Facilities Utilization Coverage

• Factless FT• Which facility is being used for what purpose

during every hour/day/term• Avg. occupancy, variation

April 28, 2023 10SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Facilities utilization as a coverage factless FT

April 28, 2023 11SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Student Attendace Events

• Grain ?• Who walks through the course’s classroom

door each day• Students registered for a course who didn’t

turn up ?• Attendace fact ?

April 28, 2023 12SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Attendance Fact Table

April 28, 2023 13SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

What are the advantages of having look-up tables in a data warehouse?

• Refreshing of dimension tables becomes faster• Faster loading of fact tables• Faster generation of surrogate keys (for new records in

dimensions)• Faster Implementation of Type I changes• Faster Implementation of Type II changes

April 28, 2023 14SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Compare & contrast outriggers & mini-dimensions. Give situations under which you would prefer an outrigger to a mini-dimension & vice versa

• Similarities:• Both are used to handle demographic data• Both have negative impact on browsing performance of

dimensions• Both can coexist for a dimension

April 28, 2023 15SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Dissimilarities:• Outriggers remove low cardinality columns, whereas MD

removes frequently accessed or frequently attributes• Outriggers are linked to the dimension table whereas MDs are

linked directly to the fact table• Outriggers are preferred when you want to query the customer

dimension alone while doing customer profiling. In case of MDs we need to query the fact table also to get the most recent values

• Loss of information due to bands in MD. Nothing like that in case of outriggers

• Outriggers are good for demographic data that is at a different granularity level. For. eg. Customer vs. state

• Outriggers save space. Nothing like that in case of MDs

April 28, 2023 16SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Grocery StoreData Warehouse

Dr. Navneet GoyalAssociate Professor

Computer Science DepartmentBITS, Pilani

Business Processes

• Sales• Inventory• Procurement• Order Management• Promotion

April 28, 2023 18SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Value Chain

Retailer Issues Purchase Order

Deliveries @ Retailer WH

Retailer WHInventory

Deliveries @Retail Store

Retail StoreInventory

Retail StoreSales

April 28, 2023 19SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

The Scenario• A chain of grocery stores in the US• 100 stores • 60,000 individual products on the shelves in

each store• 6,000 products (on an average) sell each day in

a given store • Each product belongs to a subcategory• Each subcategory belongs to a category• Each category belongs to a department

April 28, 2023 20SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Some Terms

• SKU (Stock Keeping Units)• UPC (Universal Product Codes)• EPOS ( Electronic Point of Sales)

April 28, 2023 21SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

What Management is Interested In?

• Ordering logistics• Stocking shelves• Selling products• Maximize profits

April 28, 2023 22SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Data Warehouse:Design Steps

Step 1: Identify the Business Process

Step 2: Declare the Grain

Step 3: Identify the Dimensions

Step 4: Identify the FactsApril 28, 2023 23SS ZG515, Vikas Singh, CSIS

Dept., BITS Pilani

Star Schema

FK FK

FK FK

Sales FactTable

Location Dimension

Promotion Dimension

ProductDimension

TimeDimension

April 28, 2023 24SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

The “Classic” Star Schema

PERIOD KEY

Store Dimension

Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEYDollars_soldUnitsDollars_cost

Period DescYearQuarterMonthDay

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.

Product Desc.BrandColorSizeManufacturer

STORE KEY

April 28, 2023 25SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Types of Facts• Fully-additive-all dimensions

– Units_sold, Sales_amt• Semi-additive-some dimensions

– Account_balance, Customer_count28/3,tissue paper,store1, 25, 250,2028/3,paper towel,store1, 35, 350,30Is no. of customers who bought either tissue paper or paper towel is 50? NO.

• Non-additive-none– Gross margin=Gross profit/amount– Note that GP and Amount are fully additive– Ratio of the sums and not sum of the ratios

April 28, 2023 26SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Facts for Grocery Store

1. Quantity sold (additive)2. Dollar revenue (additive)3. Dollar cost (additive)4. Customer count (semi-additive, not additive

along the product dimension)

April 28, 2023 27SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Fact Table for Grocery Store

Field name Example Values

Description/Remarks

Date key (FK) 1 Surrogate key

Product key (FK)

1 Surrogate key

Store key (FK) 1 Surrogate key

EPOS transaction no.

100 Trancsaction number generated by the Operational system to record sales

Sales Quantity 2 No. of units bought by a customer

Sales amount 72 Amount received by selling 2 units

Cost amount 65 Cost price of 2 units

April 28, 2023 28SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Promotion Dimension

• Causal Dimension• Which causes or being the cause• Promotion conditions include

– TPRs– End-aisle displays– Newspapers ads– Coupons– Combinations are common

April 28, 2023 29SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Promotion Dimension

• Management is interested in knowing how effective their promotion schemes are

• Promotion are judged on the basis of:– Lift and Baseline sales– Time shifting– Cannibalization– Growing the market

April 28, 2023 30SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Modeling Promotion Dimension

• Difficult to capture the effect of promotion• Little or NO provision in operational system to

capture promotions• Multiple promotion schemes at the same time• Promotion schemes applicable to many products• Different grain than sales• What about products that were on promotion but

not sold?

April 28, 2023 31SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Modeling Promotion Dimension

Captures combination of promotion techniques in effect at the time of sale

Promotions are generally at a higher grain than sales fact table

Adding a promotion dimension is thus possible

Promotion and product relationship is captured implicitly in the fact table

But we are missing out on one important piece of information

Products on promotion that did not sellApril 28, 2023 32SS ZG515, Vikas Singh, CSIS

Dept., BITS Pilani

Modeling Promotion Dimension

• Different causal conditions are highly correlated• Create one row for each combination of promotion

conditions• All stores run 3 promotion mechanisms

simultaneously, but a few stores are not able to deploy end-aisle displays– One record for combination of 3– One record for combination of 2

April 28, 2023 33SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Modeling Promotion Dimension

In one year, there may be 1000 ads, 5000 TPRs, and 1000 end-aisle displays

Only 10000 combinations of these three conditions affecting a particular product

A sample promotion dimensionPromotion key Coupon typePromotion name Ad media typeTPR type Display ProviderPromotion Media type Promotion CostAd type Start DateDisplay type End Date……

Include a NO promotion in effect row in promotion dimension

April 28, 2023 34SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Modeling Promotion Dimension

Promotion Coverage Factless Fact Table Same Dimensions apply as that for Sales fact

table So what is different? Is the grain different? One row in the fact table for each product in a

store each day ( or week ) regardless of whether the product was sold or not

NO FACTS INVOLVED!! How to find products that were on promotion

on a day but did not sell?

April 28, 2023 35SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Database SizingFACT TABLE SIZE• 3 year data• 100 stores• Daily grain• 60,000 SKUs• Sparsity = 10%• 4 dimensions (16 bytes)• 4 facts (16 bytes)Total Size=3x365x100x6000x3220

GBApril 28, 2023 36SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Sample Data Warehouse

Time DimensionProduct DimensionStore DimensionPromotion DimensionSales Fact TablePromotion Coverage Fact Table

April 28, 2023 37SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Q & A

top related