data mining by archana ketkar. what is data mining? data mining is the principle of sorting through...

24
Data Mining By Archana Ketkar

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Data Mining

By Archana Ketkar

Page 2: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

What Is Data Mining?

Data mining is the principle of sorting through large amounts of data and picking out relevant information.

In other words… Data mining (knowledge discovery from data)

Extraction of interesting (non-trivial, implicit, previously unknown

and potentially useful) patterns or knowledge from huge amount of data

Other names Knowledge discovery (mining) in databases (KDD), knowledge

extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Page 3: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Some Definitions

Data : Data are any facts, numbers, or text that can be processed by a computer. operational or transactional data such as, sales, cost,

inventory, payroll, and accounting nonoperational data, such as industry sales, forecast

data, and macro economic data meta data - data about the data itself, such as logical

database design or data dictionary definitions

Information: The patterns, associations, or relationships among all this data can provide information.

Page 4: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Definitions Continued..

Knowledge: Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in terms of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

Data Warehouses: Data warehousing is defined as a process of centralized data management and retrieval.

Page 5: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Data Warehouse example

Page 6: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Data Rich, Information Poor

Page 7: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Data Mining process

Page 8: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Knowledge discovery from data

KDD process includes

data cleaning (to remove noise and inconsistent data)

data integration (where multiple data sources may be combined)

data selection (where data relevant to the analysis task are retrieved from the database)

data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations)

Page 9: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

KDD continued….

data mining (an essential process where intelligent methods are applied in order to extract data patterns.

pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures)

knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)

Data mining is a core of knowledge discovery process

Page 10: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Knowledge Discovery (KDD) Process

Data mining—core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 11: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Data Mining: Confluence of Multiple Disciplines

Data Mining

Database Technology Statistics

MachineLearning

PatternRecognition

AlgorithmOther

Disciplines

Visualization

Page 12: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Functionalities/Techniques:

Concept/Class Description: Characterization and Discrimination

Mining Frequent Patterns, Associations and correlations

Classification and Prediction Cluster Analysis Outlier Analysis Evolution Analysis

Page 13: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Concept/Class Description: Characterization and Discrimination

Data Characterization: A data mining system should be able to produce a description summarizing the characteristics of customers.

Example: The characteristics of customers who spend more than $1000 a year at (some store called ) AllElectronics. The result can be a general profile such as age, employment status or credit ratings.

Page 14: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Characterization and Discriminationcontinued… Data Discrimination: It is a comparison of the

general features of targeting class data objects with the general features of objects from one or a set of contrasting classes. User can specify target and contrasting classes.

Example: The user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by about 30% in the same duration.

Page 15: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Mining Frequent Patterns, Associations and correlationsFrequent Patterns : as the name suggests patterns that occur

frequently in data.Association Analysis: from marketing perspective, determining

which items are frequently purchased together within the same transaction.

Example: An example is mined from the (some store) AllElectronic transactional database.

buys (X, “Computers”) buys (X, “software”) [Support = 1%, confidence = 50% ]

X represents customer confidence = 50% , if a customer buys a computer there is a

50% chance that he/she will buy software as well. Support = 1%, means that 1% of all the transactions under

analysis showed that computer and software were purchased together.

Page 16: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Mining Frequent Patterns, Associations and correlations Another example: Age (X, 20…29) ^ income (X, 20K-29K)

buys(X, “CD Player”) [Support = 2%, confidence = 60% ]

Customers between 20 to 29 years of age with an income $20000-$29000. There is 60% chance they will purchase CD Player and 2% of all the transactions under analysis showed that this age group customers with that range of income bought CD Player.

Page 17: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Classification and Prediction

Classification is the process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.

Classification model can be represented in various forms such as

IF-THEN Rules A decision tree Neural network

Page 18: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Classification Model

Page 19: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Cluster Analysis

Clustering analyses data objects without consulting a known class label.

Example: Cluster analysis can be performed on AllElectronics customer data in order to identify homogeneous subpopulations of customers. These clusters may represent individual target groups for marketing. The figure on next slide shows a 2-D plot of customers with respect to customer locations in a city.

Page 20: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Cluster Analysis

Page 21: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Outlier Analysis

Outlier Analysis : A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers.

Example: Use in finding Fraudulent usage of credit cards. Outlier Analysis may uncover Fraudulent usage of credit cards by detecting purchases of extremely large amounts for a given account number in comparison to regular charges incurred by the same account. Outlier values may also be detected with respect to the location and type of purchase or the purchase frequency.

Page 22: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Evolution Analysis

Evolution Analysis: Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.

Example: Time-series data. If the stock market data (time-series) of the last several years available from the New York Stock exchange and one would like to invest in shares of high tech industrial companies. A data mining study of stock exchange data may identify stock evolution regularities for overall stocks and for the stocks of particular companies. Such regularities may help predict future trends in stock market prices, contributing to one’s decision making regarding stock investments.

Page 23: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

References :

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

Data Mining Concepts and Techniques,Jiwei Han and Micheline Kamber,2006.

http://www.eco.utexas.edu/~norman/BUS.FOR/course.mat/Alex/#1

http://en.wikipedia.org/wiki/Data_mining http://www-faculty.cs.uiuc.edu/~hanj/bk2/

Page 24: Data Mining By Archana Ketkar. What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information

Thank you!