l15.pptx
TRANSCRIPT
Case Study: Education
Student Registration, Attendance, & Performance
Requirements
• Analyze student registration • Analyze student attendance• Relate attendance to performance• Analyze facility utilization
April 28, 2023 2SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Scenario
• Many Disciplines• Many Departments• First degree & Higher degree• Multiple campuses• 4000 on campus students (Pilani)• 350 courses offered each semester• Each student doing 6 courses/sem • 40 lectures per course• 5 years data
April 28, 2023 3SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Analysis Requirements• Top/Bottom 5 electives • Correlation between attendance and performance• Variation in MGPA in courses• Variation in CGPA of students discipline wise/campus wise• Average CGPA/MGPA over the last few semesters at different
campuse• Attendance/performance in CDCs of first & second
disciplines• Attendance in forenoon/afternoon sessions• Attendance for UG/PG students• Most popular discipline as choice for dual at different
campuses• Performance of dualites vs. single degree students
April 28, 2023 4SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Fig1
April 28, 2023 5SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Student Registration Event
• Grain of the FT would be one row for each registered course by student & semester
• Semester is the lowest level available for the registration events
• Semester dimension should conform to the calendar date dimension
• Student dimension should have demographic data + on campus information like part-time, full-time, involvement in athletics, major, UG/PG,
April 28, 2023 6SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Registration Event as a factless fact table
April 28, 2023 7SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Registration Events
• How many students have registered for a given faculty member’s course during the last five years ?How many have registered for more than a course from a given faculty member ?
• When counting the number of registrations for a faculty member, any key can be used as the argument to the count function e.g. SELCET FACULTY, COUNT (TERM_KEY) … GROUP BY FACULTY
It gives simple count of the number of student registrations by faculty, subject to any constraints that may exist in WHERE clause
April 28, 2023 8SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Registration Events
• If we can track the tuition revenue, earned credit hours and grade scores, we could add them to our fact table, however, our FT is no longer a factless fact table
• Revenue generated by course or faculty• Avg. grade per class by faculty
April 28, 2023 9SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Facilities Utilization Coverage
• Factless FT• Which facility is being used for what purpose
during every hour/day/term• Avg. occupancy, variation
April 28, 2023 10SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Facilities utilization as a coverage factless FT
April 28, 2023 11SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Student Attendace Events
• Grain ?• Who walks through the course’s classroom
door each day• Students registered for a course who didn’t
turn up ?• Attendace fact ?
April 28, 2023 12SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Attendance Fact Table
April 28, 2023 13SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
What are the advantages of having look-up tables in a data warehouse?
• Refreshing of dimension tables becomes faster• Faster loading of fact tables• Faster generation of surrogate keys (for new records in
dimensions)• Faster Implementation of Type I changes• Faster Implementation of Type II changes
April 28, 2023 14SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Compare & contrast outriggers & mini-dimensions. Give situations under which you would prefer an outrigger to a mini-dimension & vice versa
• Similarities:• Both are used to handle demographic data• Both have negative impact on browsing performance of
dimensions• Both can coexist for a dimension
April 28, 2023 15SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Dissimilarities:• Outriggers remove low cardinality columns, whereas MD
removes frequently accessed or frequently attributes• Outriggers are linked to the dimension table whereas MDs are
linked directly to the fact table• Outriggers are preferred when you want to query the customer
dimension alone while doing customer profiling. In case of MDs we need to query the fact table also to get the most recent values
• Loss of information due to bands in MD. Nothing like that in case of outriggers
• Outriggers are good for demographic data that is at a different granularity level. For. eg. Customer vs. state
• Outriggers save space. Nothing like that in case of MDs
April 28, 2023 16SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Grocery StoreData Warehouse
Dr. Navneet GoyalAssociate Professor
Computer Science DepartmentBITS, Pilani
Business Processes
• Sales• Inventory• Procurement• Order Management• Promotion
April 28, 2023 18SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Value Chain
Retailer Issues Purchase Order
Deliveries @ Retailer WH
Retailer WHInventory
Deliveries @Retail Store
Retail StoreInventory
Retail StoreSales
April 28, 2023 19SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
The Scenario• A chain of grocery stores in the US• 100 stores • 60,000 individual products on the shelves in
each store• 6,000 products (on an average) sell each day in
a given store • Each product belongs to a subcategory• Each subcategory belongs to a category• Each category belongs to a department
April 28, 2023 20SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Some Terms
• SKU (Stock Keeping Units)• UPC (Universal Product Codes)• EPOS ( Electronic Point of Sales)
April 28, 2023 21SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
What Management is Interested In?
• Ordering logistics• Stocking shelves• Selling products• Maximize profits
April 28, 2023 22SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Data Warehouse:Design Steps
Step 1: Identify the Business Process
Step 2: Declare the Grain
Step 3: Identify the Dimensions
Step 4: Identify the FactsApril 28, 2023 23SS ZG515, Vikas Singh, CSIS
Dept., BITS Pilani
Star Schema
FK FK
FK FK
Sales FactTable
Location Dimension
Promotion Dimension
ProductDimension
TimeDimension
April 28, 2023 24SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
The “Classic” Star Schema
PERIOD KEY
Store Dimension
Time Dimension
Product Dimension
STORE KEYPRODUCT KEYPERIOD KEYDollars_soldUnitsDollars_cost
Period DescYearQuarterMonthDay
Fact Table
PRODUCT KEY
Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.
Product Desc.BrandColorSizeManufacturer
STORE KEY
April 28, 2023 25SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Types of Facts• Fully-additive-all dimensions
– Units_sold, Sales_amt• Semi-additive-some dimensions
– Account_balance, Customer_count28/3,tissue paper,store1, 25, 250,2028/3,paper towel,store1, 35, 350,30Is no. of customers who bought either tissue paper or paper towel is 50? NO.
• Non-additive-none– Gross margin=Gross profit/amount– Note that GP and Amount are fully additive– Ratio of the sums and not sum of the ratios
April 28, 2023 26SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Facts for Grocery Store
1. Quantity sold (additive)2. Dollar revenue (additive)3. Dollar cost (additive)4. Customer count (semi-additive, not additive
along the product dimension)
April 28, 2023 27SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Fact Table for Grocery Store
Field name Example Values
Description/Remarks
Date key (FK) 1 Surrogate key
Product key (FK)
1 Surrogate key
Store key (FK) 1 Surrogate key
EPOS transaction no.
100 Trancsaction number generated by the Operational system to record sales
Sales Quantity 2 No. of units bought by a customer
Sales amount 72 Amount received by selling 2 units
Cost amount 65 Cost price of 2 units
April 28, 2023 28SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Promotion Dimension
• Causal Dimension• Which causes or being the cause• Promotion conditions include
– TPRs– End-aisle displays– Newspapers ads– Coupons– Combinations are common
April 28, 2023 29SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Promotion Dimension
• Management is interested in knowing how effective their promotion schemes are
• Promotion are judged on the basis of:– Lift and Baseline sales– Time shifting– Cannibalization– Growing the market
April 28, 2023 30SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Modeling Promotion Dimension
• Difficult to capture the effect of promotion• Little or NO provision in operational system to
capture promotions• Multiple promotion schemes at the same time• Promotion schemes applicable to many products• Different grain than sales• What about products that were on promotion but
not sold?
April 28, 2023 31SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Modeling Promotion Dimension
Captures combination of promotion techniques in effect at the time of sale
Promotions are generally at a higher grain than sales fact table
Adding a promotion dimension is thus possible
Promotion and product relationship is captured implicitly in the fact table
But we are missing out on one important piece of information
Products on promotion that did not sellApril 28, 2023 32SS ZG515, Vikas Singh, CSIS
Dept., BITS Pilani
Modeling Promotion Dimension
• Different causal conditions are highly correlated• Create one row for each combination of promotion
conditions• All stores run 3 promotion mechanisms
simultaneously, but a few stores are not able to deploy end-aisle displays– One record for combination of 3– One record for combination of 2
April 28, 2023 33SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Modeling Promotion Dimension
In one year, there may be 1000 ads, 5000 TPRs, and 1000 end-aisle displays
Only 10000 combinations of these three conditions affecting a particular product
A sample promotion dimensionPromotion key Coupon typePromotion name Ad media typeTPR type Display ProviderPromotion Media type Promotion CostAd type Start DateDisplay type End Date……
Include a NO promotion in effect row in promotion dimension
April 28, 2023 34SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Modeling Promotion Dimension
Promotion Coverage Factless Fact Table Same Dimensions apply as that for Sales fact
table So what is different? Is the grain different? One row in the fact table for each product in a
store each day ( or week ) regardless of whether the product was sold or not
NO FACTS INVOLVED!! How to find products that were on promotion
on a day but did not sell?
April 28, 2023 35SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Database SizingFACT TABLE SIZE• 3 year data• 100 stores• Daily grain• 60,000 SKUs• Sparsity = 10%• 4 dimensions (16 bytes)• 4 facts (16 bytes)Total Size=3x365x100x6000x3220
GBApril 28, 2023 36SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Sample Data Warehouse
Time DimensionProduct DimensionStore DimensionPromotion DimensionSales Fact TablePromotion Coverage Fact Table
April 28, 2023 37SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani
Q & A