data mining overview - bus317.ballenger.wlu.edu
TRANSCRIPT
Data Mining Overview
What is Data Mining and its applications?
Discussion Topics•What is Data Mining?
•Who uses Data Mining?
•Why Data Mining?
•Where Data Mining?
•When Data Mining?
•How Data Mining?
•Why study Data Mining?
Data MiningDefinition & Goal• Definition
– Data Mining is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules.
– Knowledge creation
–Business decisions should be based on learning
–Informed decisions are better than uninformed
• Goal
– To allow an “enterprise”* to IMPROVE its ______ through better understanding of its ______ .
– Potential for Competitive Advantage.
* Synonyms include: corporation, firm, non-profit organization, government agency
Foundations of Data MiningüData mining is the process of using “raw” data to
infer important “business” relationships.
üData from the past contains information that will be useful in the future (provided customer/business behavior is not completely random)
üData Mining is a collection of powerful techniques intended for analyzing large amounts of data.
üThere is no single data mining approach, but rather a set of techniques that can be used stand alone or in combination with each other.
Data Mining – Why now?
1. Data are being produced
2. Data are being warehoused
3. Computing power is more affordable
4. Interest in CRM is strong with a focus on service and information as a product
5. Data Mining software is available
Customer Relationship Management (CRM)
1. Notice – what its customers are doing
2. Remember – what it and its customers have done over time
3. Learn – from what it has remembered
4. Act On – what it has learned to make customers more profitable
In order to form a learning relationship with its customers, an enterprise (firm)
must be able to:
Analytical Customer Relationship Management
Transaction Processing Systemsnotice customer behavior
Data Warehousingremember behavior over time
Data Mininglearn from behavior
Customer Relationship Management (CRM)act on leaning
Based on Transaction Data
Based on Transaction Data
Transaction Processing Systems
Operational systemsBut sometimes used for Data Mining
Phone companies’ call records to find residential numbers being used like businesses
Catalog companies’ order histories to identify customers for future mailings
Fedex change in shipping patterns during UPS strike
Supermarket POS data to decide what coupons to print
Web retailers past purchases to determine what to display on return visits
Data Warehousing
Gather operational data together and organize it in a consistent and useful way over time.
Customer RelationshipManagement
Understand each customer individually
Use that understanding to make it easier for the customer to do business with you rather than competitors.
Transform from a product-focused organization into a customer-centric one.
Organization must be able to change its behavior as a result of what it learns through DM.
Need to know both how DM tools work and also how they will be used.
Group Exercise
• Time = 15 minutes• Teams of 4 or less• Discuss Data Mining situations among
yourselves and pick one to report to the class
• What to report (verbally – 5 minute max):– Describe the Data Mining situation– How does it help the enterprise?
Why Study Data Mining?
• Open discussion to identify these
Discussion Topics •Data Mining History
•Data Warehouse
•Data Mart
Data Mining History• The approach has roots in practice dating
back over 40 years.
• In the early 1960s, data mining was called statistical analysis, and the pioneers were statistical software companies such as SAS and SPSS.
• By the late 1980s, the traditional techniques had been augmented by new methods such as fuzzy logic, heuristics and neural networks.
Definitions of a Data Warehouse
- W.H. Inmon
“A subject-oriented, integrated, time-
variant and non-volatile collection of
data in support of management's decision
making process”
- Ralph Kimball
“A copy of transaction data, specifically structured for query
and analysis”
1.
2.
Data Warehouse• For organizational learning to take place,
data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing
• A Data Warehouse allows an organization (enterprise) to remember what it has noticed about its data
• Data Mining techniques make use of the data in a Data Warehouse
Data Warehouse
Customers
Etc…
Vendors Employees
Orders
DataWarehouse
Enterprise“Database”
Transactions
Copied, organized
summarized
Data Mining
Data Miners:• “Farmers” – they know• “Explorers” - unpredictable
Data WarehouseqA data warehouse is a copy of transaction data
specifically structured for querying, analysis
and reporting – hence, data mining.
qNote that the data warehouse contains a copy
of the transactions which are not updated or
changed later by the transaction system.
qAlso note that this data is specially
structured, and may have been transformed
when it was copied into the data warehouse.
Data Mart
•A Data Mart is a smaller, more focused
Data Warehouse – a mini-warehouse.
•A Data Mart typically reflects the
business rules of a specific business
unit within an enterprise.
Data Warehouse to Data Mart
DataWarehouse
Data Mart
Data Mart
Data Mart
Decision Support
Information
Decision Support
Information
Decision Support
Information
Data Warehouse & Mart
•Set of “Tables” – 2 or more dimensions•Designed for Aggregation
Group Exercise
• Time = 15 minutes• Teams of 4 or less• Discuss Data Warehouse to Data Mart
situations among yourselves and pick one to report to the class
• What to report (verbally – 5 minute max):– Describe the Data Warehouse to Data Mart
situation– How does it help the enterprise’s “business”
unit?
Data MiningDiscussion Topics
•Data Mining Flavors
•Data Mining Examples
•Data Mining Tasks
•Data Mining’s Biggest Challenge
•What does all of this mean?
Data Mining Flavors
• Directed (Supervised) Attempts to explain or categorize some particular target field such as income or response.
• Build models (algorithms/rules/formulas) to connect inputs to target or outcome
• For example - regression, neural networks, decision trees, nearest neighbors
• Models produce scores (fitted or predicted values) used to rank customers.
• Undirected (Unsupervised)Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes.
• For example - affinity grouping (association rules, market basket analysis), clustering, self-organizing maps.
Data Mining Examples in Enterprises• US Government
– FBI – track down criminals
– Treasury Dept – suspicious int’l funds transfer
– SEC - insider trading
• Phone companies
• Supermarkets & Superstores (Vons, Albertsons, Wal-Mart, Costco)
• Mail-Order, On-Line Order (L.L. Bean, Victoria’s Secret, Lands End)
• Financial Institutions (BofA, Wells Fargo, Charles Schwab)
• Insurance Companies (USAA, Allstate, State Farm)
• Tons of others…
Data Mining Techniques• Classification
example: Fr, So, Jr, Sr
• Estimationexample: household income
• Predictionexample: predict credit card balance transfer average amount
• Affinity GroupingExample: people who buy X, often buy Y also with probability Z%
• Clusteringsimilar to classification but no predefined classes
• Description and Profiling behavior begets an explanation
Data Mining’s Biggest Challenge• The largest challenge a data miner may face
is the sheer volume of data in the data warehouse.
• summary data must be available to get the analysis started.
• this sheer volume may mask the important relationships the data miner is interested in.
• Must be able to overcome the volume and be able to interpret the data.
What Does All of This Mean?• On a regular basis, “farmers” and “explorers”
utilize their data warehouses to give guidance to and/or answer a limitless variety of questions.
• Nothing is free, however, and the benefits do come with a cost.
• The value of a data warehouse and subsequent data mining is a result of the new and changed business processes it enables – competitive advantage also.
• There are limitations, though - A Data Warehouse cannot correct problems with its data, although it may help to more clearly identify them.
The Virtuous Cycle of Data Mining
Data are at the heart of most companies’ core business processes
Data are generated by transactions regardless of industry
In addition to this internal data, there are tons of external data sources (credit ratings, demographics, etc.)
Data Mining’s promise is to find patterns in the “gazillions” of bytes
But…
Finding patterns is not enough
Business (individuals) must:
Respond to the pattern(s) by taking action
Turning:
Data into Information
Information into Action
Action into Value
Hence, the Virtuous Cycle of Data Mining
Data Mining…Easy?Marketing literature makes it look easy!!!
Just apply automated algorithms created by great minds, such as:
Neural networks
Decision trees
Genetic algorithms
“Poof”…Magic happens!!!
Not So…Data Mining is an iterative, learning processData Mining takes conscientious, long-term hard work and commitmentData Mining’s Reward:
Success can transform a company from being reactive to being proactive
Data Mining’s Virtuous Cycle1. Identify the business opportunity/Problem
2. Mining data to transform it into actionable
information
3. Acting on the information
4. Measuring the results
Bank of AmericaCase Study
In-Class ExerciseReview Bank of America Case Study
found in the textbook on pages 11 - 14
Identify the Business Opportunity
Many business processes are good candidates:New product introductionDirect marketing campaignUnderstanding customer attrition/churnEvaluating the results of a test market
Measurements from past Data Mining efforts:What types of customers responded to our last campaign?Where do the best customers live?Are long waits in check-out lines a cause of customer attrition?What products should be promoted with our XYZ product?
TIP When talking with business users about data mining opportunities, make sure you focus on the business problems/opportunities and not on technology and algorithms.
Mining data to transform it into actionable information
Success is making business sense of the data
Numerous data “issues”:
Bad data formats (alpha vs numeric, missing, null, bogus data)
Confusing data fields (synonyms and differences)
Lack of functionality (“I wish I could…”)
Legal ramifications (privacy, etc.)
Organizational factors (unwilling to change “our ways”)
Lack of timeliness
Acting on the Information
This is the purpose of Data Mining
the hope of adding value
What type of action?
Interactions with customers, prospects, suppliers
Modifying service procedures
Adjusting inventory levels
Consolidating
Expanding
Etc…
Measuring the ResultsAssesses the impact of the action taken
Often overlooked, ignored, skipped
Planning for the measurement should begin when
analyzing the business opportunity, not after it is
“all over”
Assessment questions (examples):
Did this ____ campaign do what we hoped?
Did some offers work better than others?
Did these customers purchase additional products?
Tons of others…