l15.pptx

38
Case Study: Education Student Registration, Attendance, & Performance

Upload: subiec79

Post on 13-Jul-2016

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: L15.pptx

Case Study: Education

Student Registration, Attendance, & Performance

Page 2: L15.pptx

Requirements

• Analyze student registration • Analyze student attendance• Relate attendance to performance• Analyze facility utilization

April 28, 2023 2SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 3: L15.pptx

Scenario

• Many Disciplines• Many Departments• First degree & Higher degree• Multiple campuses• 4000 on campus students (Pilani)• 350 courses offered each semester• Each student doing 6 courses/sem • 40 lectures per course• 5 years data

April 28, 2023 3SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 4: L15.pptx

Analysis Requirements• Top/Bottom 5 electives • Correlation between attendance and performance• Variation in MGPA in courses• Variation in CGPA of students discipline wise/campus wise• Average CGPA/MGPA over the last few semesters at different

campuse• Attendance/performance in CDCs of first & second

disciplines• Attendance in forenoon/afternoon sessions• Attendance for UG/PG students• Most popular discipline as choice for dual at different

campuses• Performance of dualites vs. single degree students

April 28, 2023 4SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 5: L15.pptx

Fig1

April 28, 2023 5SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 6: L15.pptx

Student Registration Event

• Grain of the FT would be one row for each registered course by student & semester

• Semester is the lowest level available for the registration events

• Semester dimension should conform to the calendar date dimension

• Student dimension should have demographic data + on campus information like part-time, full-time, involvement in athletics, major, UG/PG,

April 28, 2023 6SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 7: L15.pptx

Registration Event as a factless fact table

April 28, 2023 7SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 8: L15.pptx

Registration Events

• How many students have registered for a given faculty member’s course during the last five years ?How many have registered for more than a course from a given faculty member ?

• When counting the number of registrations for a faculty member, any key can be used as the argument to the count function e.g. SELCET FACULTY, COUNT (TERM_KEY) … GROUP BY FACULTY

It gives simple count of the number of student registrations by faculty, subject to any constraints that may exist in WHERE clause

April 28, 2023 8SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 9: L15.pptx

Registration Events

• If we can track the tuition revenue, earned credit hours and grade scores, we could add them to our fact table, however, our FT is no longer a factless fact table

• Revenue generated by course or faculty• Avg. grade per class by faculty

April 28, 2023 9SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 10: L15.pptx

Facilities Utilization Coverage

• Factless FT• Which facility is being used for what purpose

during every hour/day/term• Avg. occupancy, variation

April 28, 2023 10SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 11: L15.pptx

Facilities utilization as a coverage factless FT

April 28, 2023 11SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 12: L15.pptx

Student Attendace Events

• Grain ?• Who walks through the course’s classroom

door each day• Students registered for a course who didn’t

turn up ?• Attendace fact ?

April 28, 2023 12SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 13: L15.pptx

Attendance Fact Table

April 28, 2023 13SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 14: L15.pptx

What are the advantages of having look-up tables in a data warehouse?

• Refreshing of dimension tables becomes faster• Faster loading of fact tables• Faster generation of surrogate keys (for new records in

dimensions)• Faster Implementation of Type I changes• Faster Implementation of Type II changes

April 28, 2023 14SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 15: L15.pptx

Compare & contrast outriggers & mini-dimensions. Give situations under which you would prefer an outrigger to a mini-dimension & vice versa

• Similarities:• Both are used to handle demographic data• Both have negative impact on browsing performance of

dimensions• Both can coexist for a dimension

April 28, 2023 15SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 16: L15.pptx

Dissimilarities:• Outriggers remove low cardinality columns, whereas MD

removes frequently accessed or frequently attributes• Outriggers are linked to the dimension table whereas MDs are

linked directly to the fact table• Outriggers are preferred when you want to query the customer

dimension alone while doing customer profiling. In case of MDs we need to query the fact table also to get the most recent values

• Loss of information due to bands in MD. Nothing like that in case of outriggers

• Outriggers are good for demographic data that is at a different granularity level. For. eg. Customer vs. state

• Outriggers save space. Nothing like that in case of MDs

April 28, 2023 16SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 17: L15.pptx

Grocery StoreData Warehouse

Dr. Navneet GoyalAssociate Professor

Computer Science DepartmentBITS, Pilani

Page 18: L15.pptx

Business Processes

• Sales• Inventory• Procurement• Order Management• Promotion

April 28, 2023 18SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 19: L15.pptx

Value Chain

Retailer Issues Purchase Order

Deliveries @ Retailer WH

Retailer WHInventory

Deliveries @Retail Store

Retail StoreInventory

Retail StoreSales

April 28, 2023 19SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 20: L15.pptx

The Scenario• A chain of grocery stores in the US• 100 stores • 60,000 individual products on the shelves in

each store• 6,000 products (on an average) sell each day in

a given store • Each product belongs to a subcategory• Each subcategory belongs to a category• Each category belongs to a department

April 28, 2023 20SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 21: L15.pptx

Some Terms

• SKU (Stock Keeping Units)• UPC (Universal Product Codes)• EPOS ( Electronic Point of Sales)

April 28, 2023 21SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 22: L15.pptx

What Management is Interested In?

• Ordering logistics• Stocking shelves• Selling products• Maximize profits

April 28, 2023 22SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 23: L15.pptx

Data Warehouse:Design Steps

Step 1: Identify the Business Process

Step 2: Declare the Grain

Step 3: Identify the Dimensions

Step 4: Identify the FactsApril 28, 2023 23SS ZG515, Vikas Singh, CSIS

Dept., BITS Pilani

Page 24: L15.pptx

Star Schema

FK FK

FK FK

Sales FactTable

Location Dimension

Promotion Dimension

ProductDimension

TimeDimension

April 28, 2023 24SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 25: L15.pptx

The “Classic” Star Schema

PERIOD KEY

Store Dimension

Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEYDollars_soldUnitsDollars_cost

Period DescYearQuarterMonthDay

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.

Product Desc.BrandColorSizeManufacturer

STORE KEY

April 28, 2023 25SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 26: L15.pptx

Types of Facts• Fully-additive-all dimensions

– Units_sold, Sales_amt• Semi-additive-some dimensions

– Account_balance, Customer_count28/3,tissue paper,store1, 25, 250,2028/3,paper towel,store1, 35, 350,30Is no. of customers who bought either tissue paper or paper towel is 50? NO.

• Non-additive-none– Gross margin=Gross profit/amount– Note that GP and Amount are fully additive– Ratio of the sums and not sum of the ratios

April 28, 2023 26SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 27: L15.pptx

Facts for Grocery Store

1. Quantity sold (additive)2. Dollar revenue (additive)3. Dollar cost (additive)4. Customer count (semi-additive, not additive

along the product dimension)

April 28, 2023 27SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 28: L15.pptx

Fact Table for Grocery Store

Field name Example Values

Description/Remarks

Date key (FK) 1 Surrogate key

Product key (FK)

1 Surrogate key

Store key (FK) 1 Surrogate key

EPOS transaction no.

100 Trancsaction number generated by the Operational system to record sales

Sales Quantity 2 No. of units bought by a customer

Sales amount 72 Amount received by selling 2 units

Cost amount 65 Cost price of 2 units

April 28, 2023 28SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 29: L15.pptx

Promotion Dimension

• Causal Dimension• Which causes or being the cause• Promotion conditions include

– TPRs– End-aisle displays– Newspapers ads– Coupons– Combinations are common

April 28, 2023 29SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 30: L15.pptx

Promotion Dimension

• Management is interested in knowing how effective their promotion schemes are

• Promotion are judged on the basis of:– Lift and Baseline sales– Time shifting– Cannibalization– Growing the market

April 28, 2023 30SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 31: L15.pptx

Modeling Promotion Dimension

• Difficult to capture the effect of promotion• Little or NO provision in operational system to

capture promotions• Multiple promotion schemes at the same time• Promotion schemes applicable to many products• Different grain than sales• What about products that were on promotion but

not sold?

April 28, 2023 31SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 32: L15.pptx

Modeling Promotion Dimension

Captures combination of promotion techniques in effect at the time of sale

Promotions are generally at a higher grain than sales fact table

Adding a promotion dimension is thus possible

Promotion and product relationship is captured implicitly in the fact table

But we are missing out on one important piece of information

Products on promotion that did not sellApril 28, 2023 32SS ZG515, Vikas Singh, CSIS

Dept., BITS Pilani

Page 33: L15.pptx

Modeling Promotion Dimension

• Different causal conditions are highly correlated• Create one row for each combination of promotion

conditions• All stores run 3 promotion mechanisms

simultaneously, but a few stores are not able to deploy end-aisle displays– One record for combination of 3– One record for combination of 2

April 28, 2023 33SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 34: L15.pptx

Modeling Promotion Dimension

In one year, there may be 1000 ads, 5000 TPRs, and 1000 end-aisle displays

Only 10000 combinations of these three conditions affecting a particular product

A sample promotion dimensionPromotion key Coupon typePromotion name Ad media typeTPR type Display ProviderPromotion Media type Promotion CostAd type Start DateDisplay type End Date……

Include a NO promotion in effect row in promotion dimension

April 28, 2023 34SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 35: L15.pptx

Modeling Promotion Dimension

Promotion Coverage Factless Fact Table Same Dimensions apply as that for Sales fact

table So what is different? Is the grain different? One row in the fact table for each product in a

store each day ( or week ) regardless of whether the product was sold or not

NO FACTS INVOLVED!! How to find products that were on promotion

on a day but did not sell?

April 28, 2023 35SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 36: L15.pptx

Database SizingFACT TABLE SIZE• 3 year data• 100 stores• Daily grain• 60,000 SKUs• Sparsity = 10%• 4 dimensions (16 bytes)• 4 facts (16 bytes)Total Size=3x365x100x6000x3220

GBApril 28, 2023 36SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 37: L15.pptx

Sample Data Warehouse

Time DimensionProduct DimensionStore DimensionPromotion DimensionSales Fact TablePromotion Coverage Fact Table

April 28, 2023 37SS ZG515, Vikas Singh, CSIS Dept., BITS Pilani

Page 38: L15.pptx

Q & A