lecture 6 mark2039 winter 2006 george brown college wednesday 9-12
TRANSCRIPT
Lecture 6
MARK2039
Winter 2006
George Brown College
Wednesday 9-12
2
Assignment 51) You are asked to conduct the following analysis:
Obtain a count of all customers who live in Quebec and who bought shoes in the last month Indicate to me which are the dimensions and which are the measures Measures: Count Dimensions: Quebec,shoes, and last month
What technology would you employ that would empower business users to create the above report ‘Cube technology’
3
Assignment 5 2)
Produce a summarized analytical file with one record per one customer with three fields In your opinion, which customer is the worst and why? (Note: One field will contain the Cust ID.) Cust ID Total Amount Most Recent Purchase Date 123 300 Feb 06 456 1280 Jan.06 789 76 Nov 05 12 10 Sept.05
Cust ID Amount Type Date123 30 clothing Feb-06456 5 clothing Aug-05123 200 furniture Jan-06456 150 furniture Dec-05789 76 clothing Nov-05456 550 furniture Sep-05123 70 furniture Sep-05456 350 clothing Jan-06456 225 furniture Jan-0612 10 clothing Sept-05
4
Assignment 5
3)You have three external files containing the following unique # of records -File A contains 10000 unique records and is matched to the customer file by the 1st3 digits of postal code -File B contains 20000 unique records and is matched to the customer file by the 1st 4 digits of postal code -File C contains 50000 unique records and is matched to the customer file by the enumeration area Based on the above information and all other things being equal, which file would be most useful in a data mining exercise Based on the above information and all other things being equal, which file would be least useful in a data mining exercise most useful: File C least useful: File A
5
Recap
• What is the difference between Census data and Taxfiler– Advantages vs. disadvantages
• Consider granularity of file records and age of data
• What are the advantages of ‘Cube’ technology
6
Sourcing the Data (Extraction)
Typical Data Sources - External• Business to Business “Firmographics”
– SIC, Number of Employees, Revenue etc.
– Sources:• D&B• CBI / InfoCanada• Scott’s
Company Employee SizeIndustry
Classification Sales SizeYrs In
businessXYZ 1-4 retail <1 million 10…. … … .. ..
7
Sourcing the Data (Extraction)
Typical Data Sources - Survey• Attitudinal- Needs, preferences, social values, opinions• Behavioral- Buying habits, lifestyle, brand usage
For most data mining projects, we want to assign a value to all customers; therefore the information used must be available for all customers– survey-based information generally cannot be used as it typically
can only be applied a small portion of the database
8
Sourcing the Data (Extraction)
Typical Data Sources - Survey• ICOM
– Surveys to approx. 10MM Canadians– Fully updated every 2 years– Contains attitude behaviour and purchase behaviours across all
industry sectors
• What do you think the value is here?
9
Examples
• A marketer wants to target high risk cancelsfor a retention campaign for a Telco. Information is contained in legacy database systems containing a customer file, transaction file, and call detail file. As a marketer and analyst, answer the following requirements
– 5 Key Data fields from above files that should be created in analytical exercise
– Create a diagram or schema of how this data would be linked into an analytical file
– What resources would you need and why?• People• Software
10
Examples
• How would the previous example change if the information was available in a data mart or warehouse
11
Examples
• A credit card company has 100000 customers containing tombstone information and detailed transactional information on their database. 50000 customers have email addresses. 10% of 50000 customers have responded to a survey in which 5% have indicated that they consider themselves loyal customers. Web activity of these loyal customers indicate that many of them have clicked on travel-related packages.
– Database information contains• Age,gender,income, where they spend, recency of spend,frequency of
spend, and amount of spend.
– Survey Information contains information on attitudes and preferences regarding their credit card behaviour
– Web information contains page view behaviour of customer
• As a marketer and analyst, how would you use this information to sell travel-related insurance.