lecture 6 mark2039 winter 2006 george brown college wednesday 9-12

11
Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

Upload: augustine-conley

Post on 12-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

Lecture 6

MARK2039

Winter 2006

George Brown College

Wednesday 9-12

Page 2: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

2

Assignment 51) You are asked to conduct the following analysis:

Obtain a count of all customers who live in Quebec and who bought shoes in the last month Indicate to me which are the dimensions and which are the measures Measures: Count Dimensions: Quebec,shoes, and last month

What technology would you employ that would empower business users to create the above report ‘Cube technology’

Page 3: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

3

Assignment 5 2)

Produce a summarized analytical file with one record per one customer with three fields In your opinion, which customer is the worst and why? (Note: One field will contain the Cust ID.) Cust ID Total Amount Most Recent Purchase Date 123 300 Feb 06 456 1280 Jan.06 789 76 Nov 05 12 10 Sept.05

Cust ID Amount Type Date123 30 clothing Feb-06456 5 clothing Aug-05123 200 furniture Jan-06456 150 furniture Dec-05789 76 clothing Nov-05456 550 furniture Sep-05123 70 furniture Sep-05456 350 clothing Jan-06456 225 furniture Jan-0612 10 clothing Sept-05

Page 4: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

4

Assignment 5

3)You have three external files containing the following unique # of records -File A contains 10000 unique records and is matched to the customer file by the 1st3 digits of postal code -File B contains 20000 unique records and is matched to the customer file by the 1st 4 digits of postal code -File C contains 50000 unique records and is matched to the customer file by the enumeration area Based on the above information and all other things being equal, which file would be most useful in a data mining exercise Based on the above information and all other things being equal, which file would be least useful in a data mining exercise most useful: File C least useful: File A

Page 5: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

5

Recap

• What is the difference between Census data and Taxfiler– Advantages vs. disadvantages

• Consider granularity of file records and age of data

• What are the advantages of ‘Cube’ technology

Page 6: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

6

Sourcing the Data (Extraction)

Typical Data Sources - External• Business to Business “Firmographics”

– SIC, Number of Employees, Revenue etc.

– Sources:• D&B• CBI / InfoCanada• Scott’s

Company Employee SizeIndustry

Classification Sales SizeYrs In

businessXYZ 1-4 retail <1 million 10…. … … .. ..

Page 7: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

7

Sourcing the Data (Extraction)

Typical Data Sources - Survey• Attitudinal- Needs, preferences, social values, opinions• Behavioral- Buying habits, lifestyle, brand usage

For most data mining projects, we want to assign a value to all customers; therefore the information used must be available for all customers– survey-based information generally cannot be used as it typically

can only be applied a small portion of the database

Page 8: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

8

Sourcing the Data (Extraction)

Typical Data Sources - Survey• ICOM

– Surveys to approx. 10MM Canadians– Fully updated every 2 years– Contains attitude behaviour and purchase behaviours across all

industry sectors

• What do you think the value is here?

Page 9: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

9

Examples

• A marketer wants to target high risk cancelsfor a retention campaign for a Telco. Information is contained in legacy database systems containing a customer file, transaction file, and call detail file. As a marketer and analyst, answer the following requirements

– 5 Key Data fields from above files that should be created in analytical exercise

– Create a diagram or schema of how this data would be linked into an analytical file

– What resources would you need and why?• People• Software

Page 10: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

10

Examples

• How would the previous example change if the information was available in a data mart or warehouse

Page 11: Lecture 6 MARK2039 Winter 2006 George Brown College Wednesday 9-12

11

Examples

• A credit card company has 100000 customers containing tombstone information and detailed transactional information on their database. 50000 customers have email addresses. 10% of 50000 customers have responded to a survey in which 5% have indicated that they consider themselves loyal customers. Web activity of these loyal customers indicate that many of them have clicked on travel-related packages.

– Database information contains• Age,gender,income, where they spend, recency of spend,frequency of

spend, and amount of spend.

– Survey Information contains information on attitudes and preferences regarding their credit card behaviour

– Web information contains page view behaviour of customer

• As a marketer and analyst, how would you use this information to sell travel-related insurance.