knowledge discovery centre: cityu-sas partnership 1 speakers: prof y v hui, cityu dr h p lo, cityu...

40
1 Knowledge Discovery Centre: CityU-SAS Partnership Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Charte

Post on 22-Dec-2015

234 views

Category:

Documents


2 download

TRANSCRIPT

1Knowledge Discovery Centre: CityU-SAS Partnership

Speakers:

Prof Y V Hui, CityUDr H P Lo, CityU

Dr Sammy Yuen, CityUDr K W Cheng, SAS Institute

Mr Steven Parker, Standard Chartered

Knowledge Discovery Centre: CityU-SAS Partnership 2

The Art and Science of Data Mining

Y V HuiCity University of Hong Kong

Knowledge Discovery Centre: CityU-SAS Partnership 3

The Driving Forces

• Specialization and focus in business- To satisfy the needs of customers- To improve and develop specific business strategies and processes- Personalization through mass customization

Knowledge Discovery Centre: CityU-SAS Partnership 4

The Driving Forces

• Challenges- local and global competition- distributed business operations- product innovation

• Technology development• Benefit, cost and risk on a product

or customer basis

Knowledge Discovery Centre: CityU-SAS Partnership 5

Data Mining• Also known as knowledge discovery in

databases. Data mining digs out valuable information from large and messy data. (Computer scientist’s definition)

• Data mining is a knowledge discovery process. It’s the integration of business knowledge, people, information, statistics and computing technology.

Knowledge Discovery Centre: CityU-SAS Partnership 6

Data Mining is Hot

• Ten Hottest Job, Time, 22 May, 2000

• 10 emerging areas of technology, MIT’s Magazine of Technology Review, Jan/Feb, 2001

Knowledge Discovery Centre: CityU-SAS Partnership 7

Data Mining Philosophy

• A powerful enabler of competitive advantage.

• Data mining is driven from business knowledge.

• Data mining is about enabling people to discover actionable information about their business.

• Return of profit isn’t about algorithms

Knowledge Discovery Centre: CityU-SAS Partnership 8

Business outlookIndustry conditions

Product offeringCustomer analysisStrategic options

Competitive actionsetc

Problemdevelopment

and management

Reporting and evaluations

Project designData collection and

preparationModel building

Validation

Management’sDecision World Interface

Data Miner’sAnalytical World

Scope of Data Mining

Knowledge Discovery Centre: CityU-SAS Partnership 9

Project Management

• Cross-functional team• System architecture

Knowledge Discovery Centre: CityU-SAS Partnership 10

Successful applications

• Business transaction- risks and opportunities

• Customer relationship management- personalization, target marketing

• Electronic commerce & web- web mining

Knowledge Discovery Centre: CityU-SAS Partnership 11

Successful applications

• Science & engineering• Health care• Multi-media• Others

Knowledge Discovery Centre: CityU-SAS Partnership 12

Data Mining Process

Understanding of businessProblem identification

Knowledge Discovery Centre: CityU-SAS Partnership 13

Understanding Your Business• Do we have a problem?

- What is the current situation? Are there any undesirable situations that need attention?- Are there any conditions, processes, etc, that could be improved?- Are any problems foreseeable that could affect the business?- Are there any potential opportunities that the company may capitalize on? A problem is a learning opportunity

Knowledge Discovery Centre: CityU-SAS Partnership 14

Understanding Your Problem

• Operational or analytical• Convention rule or knowledge

discovery• Product based or customer based• Market research or data mining• Ownership of the information• Privacy• Added value

Knowledge Discovery Centre: CityU-SAS Partnership 15

Data Mining Process

Collecting relevant information

Understanding of businessProblem identification

Knowledge Discovery Centre: CityU-SAS Partnership 16

Collecting Relevant Information

• Data Search• Data Collection• Data Preparation• Data Mining Database

Knowledge Discovery Centre: CityU-SAS Partnership 17

Data Search

• Exploring the problem space.Don’t let the data drive the problem.

• Measurement• Exploring the data sources

Knowledge Discovery Centre: CityU-SAS Partnership 18

Data Collection

• Data retrieval• Data audit• Data set assembly and data

warehouse• Survey

Knowledge Discovery Centre: CityU-SAS Partnership 19

Data Preparation

• Data representation• Data exploration• Data normalization• Data transformation• Imputation of missing data• Data tuning

Knowledge Discovery Centre: CityU-SAS Partnership 20

Data Mining Database

• Variable selection• Record selection• Data set partition

Knowledge Discovery Centre: CityU-SAS Partnership 21

Data Mining Process

Collecting relevant information Model building

Understanding of businessProblem identification

Learning

Knowledge Discovery Centre: CityU-SAS Partnership 22

Model Building

• Model based vs non-model basedy1,y2,…,yp=f(x1, …, xq)

x1, …, xqy1, …, yp

Inputs Outputs

Knowledge Discovery Centre: CityU-SAS Partnership 23

Model Building

• Parametric vs nonparametric

Knowledge Discovery Centre: CityU-SAS Partnership 24

Model Building

• Estimation vs trial and error• Directed vs undirected• Multidimensional analysis• Large data set vs small data set

Knowledge Discovery Centre: CityU-SAS Partnership 25

Data Mining Algorithms

Online AnalyticalProcessing

Discovery Driven Methods

SQL Query ToolsDescription Prediction

Classification Regressions

Decision Trees

Neural Networks

Visualization

Clustering

Association

Sequential Analysis

Knowledge Discovery Centre: CityU-SAS Partnership 26

Online Analytical Processing• Query and reporting

Example of SQL query:How many credit-card customers who

made purchases of over $1,000 on sporting goods in December have at least $20,000 of available credit?

• Manual and validation driven

Knowledge Discovery Centre: CityU-SAS Partnership 27

Estimation and Prediction

• Statistical models• Neural network

Example:Housing price valuation model

Knowledge Discovery Centre: CityU-SAS Partnership 28

Classification Algorithms

• Statistical techniques• Neural networks• Genetic algorithms• Nearest neighbor method• Rule induction and decision tree

Example: Customer segmentation and buying behavior description

Knowledge Discovery Centre: CityU-SAS Partnership 29

Association Rules

• Apriori algorithm

Example:Market basket analysis, cross selling

analysis

Knowledge Discovery Centre: CityU-SAS Partnership 30

Sequential Analysis

• Count-all algorithm• Count-some algorithm

Example:Attached mailing, add-on sales

Knowledge Discovery Centre: CityU-SAS Partnership 31

Algorithms Comparison• No single data mining algorithm can

outperform any other.Try different algorithms and draw conclusions from the results. Use your business knowledge.

• Neural networks do no better than statistical models when the underlying structure is known. However, neural networks detect hidden interactions and nonlinearity. Use the prior information if available.

Knowledge Discovery Centre: CityU-SAS Partnership 32

Algorithms Comparison

• Data mining algorithms cannot handle dependent records.Use the prior information. Statistical models help.

• Data tuning and dimension reduction enhance data mining before and after the analysis.Statistical techniques help.

Knowledge Discovery Centre: CityU-SAS Partnership 33

Data Mining Process

Collecting relevant data Model building

Understanding of businessProblem identification

Business strategyand evaluation

Learning

Action

Knowledge Discovery Centre: CityU-SAS Partnership 34

Trends that Effect Data Mining

• Data trends- data explosion- data types

Knowledge Discovery Centre: CityU-SAS Partnership 35

Trends that Effect Data Mining

• Hardware trends- memory- processing speed- storage

Knowledge Discovery Centre: CityU-SAS Partnership 36

Trends that Effect Data Mining

• Network trends- network connectivity- distributed databases

• Wireless communication

Knowledge Discovery Centre: CityU-SAS Partnership 37

Trends that Effect Data Mining

• Scientific computing trends- theory, experiment and simulation

Knowledge Discovery Centre: CityU-SAS Partnership 38

Trends that Effect Data Mining• Business trends

- total quality management,- customer relationship management,- business process reengineering, - enterprise resources planning,- supply chain management,- business intelligence and knowledge management,- e – business and m – business

Knowledge Discovery Centre: CityU-SAS Partnership 39

Trends that Effect Data Mining

• Privacy and Security

Knowledge Discovery Centre: CityU-SAS Partnership 40

Pot of Gold• The benefits of knowing one’s

business and customers become so critical that technologies are coming together to support data mining.

• Data mining is not a cybernetic magic that will turn your data into gold. It’s the process and result of knowledge production, knowledge discovery and knowledge management.