data mining by fu-chun (tracy) juang. what is data mining? ► the process of analyzing large...

15
Data Mining Data Mining By Fu-Chun (Tracy) Juang By Fu-Chun (Tracy) Juang

Upload: warren-nichols

Post on 31-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Data Mining Data Mining

By Fu-Chun (Tracy) JuangBy Fu-Chun (Tracy) Juang

Page 2: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

What is Data Mining?What is Data Mining?

► The process of analyzing LARGE databases to The process of analyzing LARGE databases to find useful patterns.find useful patterns.

► Attempts to discover rules and patterns from Attempts to discover rules and patterns from data.data.

► Similar to knowledge discovery (in artificial Similar to knowledge discovery (in artificial

intelligence) or statistical analysis. intelligence) or statistical analysis.

► => Knowledge discovery in database.=> Knowledge discovery in database.

Page 3: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Type of Knowledge Type of Knowledge Discovered Discovered

► Classification Classification

►Association RulesAssociation Rules

►ClusteringClustering

►Others -- Sequential PatternOthers -- Sequential Pattern -- Pattern within Time Series-- Pattern within Time Series

Page 4: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Classification Classification

► Deal with Deal with PredictionPrediction

► Work from an existing set of events to Work from an existing set of events to create hierarchy of classes. create hierarchy of classes.

Use this classification hierarchy to Use this classification hierarchy to predict which “class” a new item predict which “class” a new item belong.belong.

Page 5: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Classification (cont.)Classification (cont.)► Example: Example:

Credit-card company classified population Credit-card company classified population into 4 range of credit worthiness (bad, into 4 range of credit worthiness (bad, average, good and excellent) based on average, good and excellent) based on payment history of the existing customers.payment history of the existing customers.

The company will find some The company will find some rulesrules between credit worthiness and other between credit worthiness and other information about the customers, such as information about the customers, such as their educational history, age and salary. their educational history, age and salary.

Use this classification rules to determine Use this classification rules to determine (predict) credit worthiness of a new (predict) credit worthiness of a new applicant. applicant.

Page 6: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Classification : RulesClassification : Rules

► Some of the rules looks like:Some of the rules looks like:

∀∀person P, P.degree = masters person P, P.degree = masters andand P.income > 75,000 => P.credit = excellentP.income > 75,000 => P.credit = excellent

∀∀person P, P.degree = bachelors person P, P.degree = bachelors oror ( P.income ≥ 25,000 ( P.income ≥ 25,000 andand P.income ≤75,000) P.income ≤75,000) => P.credit = good=> P.credit = good

Page 7: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Classification : Decision-TreeClassification : Decision-Tree

► A popular technique for classification.A popular technique for classification.

► Each leaf node of the tree represents a Each leaf node of the tree represents a class ( e.g. good credit & bad credit)class ( e.g. good credit & bad credit)

► Each internal node has a function associate Each internal node has a function associate with it, to determine which child to go to for with it, to determine which child to go to for the new item.the new item.

(e.g. married & salary range)(e.g. married & salary range) ► When trying to place a new item in a class, When trying to place a new item in a class,

we traverse the decision-tree until we reach we traverse the decision-tree until we reach a leaf node.a leaf node.

Page 8: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Decision-Tree Decision-Tree

Page 9: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Classification : RegressionClassification : Regression

►A special application of classification rules.A special application of classification rules.

►Regression deals with the prediction of a Regression deals with the prediction of a value, rather than a class.value, rather than a class.

►e.g. If having a series of test results of a e.g. If having a series of test results of a patient, use regression rule to predict the patient, use regression rule to predict the probability of survival of that patient. probability of survival of that patient.

Page 10: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Association RulesAssociation Rules

► Retail shops are often interested in Retail shops are often interested in AssociationsAssociations betw between different items that people buy.een different items that people buy.

► X => Y , if a costumer buys X, he is likely to buy YX => Y , if a costumer buys X, he is likely to buy Y

► e.g. A female retail shopper buys a handbag, she is lie.g. A female retail shopper buys a handbag, she is likely to buy shoes.kely to buy shoes.

association rule: association rule: Handbag => ShoesHandbag => Shoes

► e.g. A person who bought the book e.g. A person who bought the book Database System Database System ConceptConcept is likely to buy is likely to buy Operating System ConceptsOperating System Concepts..

association rule: association rule: DBS Concept => OS ConceptDBS Concept => OS Concept

Page 11: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Association Rules : Association Rules : Support & ConfidenceSupport & Confidence

► Association Rules need to have degree of Association Rules need to have degree of SupportSupport and and Confidence Confidence ..

► Data miners use Support and Confidence of Data miners use Support and Confidence of the association rules to determine whether the the association rules to determine whether the particular association rule is significant.particular association rule is significant.

Page 12: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Association Rule: SupportAssociation Rule: Support

► SupportSupport is a measure of what fraction of the is a measure of what fraction of the population satisfies both LHS and RHS of the population satisfies both LHS and RHS of the rule. rule.

► Which is how frequently a specific itemset Which is how frequently a specific itemset (LHS + RHS) occurs in the database. (LHS + RHS) occurs in the database.

► If only 0.001% of all purchases in store include If only 0.001% of all purchases in store include Milk and Screwdrivers, then the support of rule: Milk and Screwdrivers, then the support of rule:

milk => screwdrivermilk => screwdriver is low. is low.

► If 50% purchases include Milk and Juice, the If 50% purchases include Milk and Juice, the support of rule: support of rule: milk => juicemilk => juice is high. is high.

Page 13: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

Association Rule: ConfidenceAssociation Rule: Confidence

► ConfidenceConfidence is a measure of how often the RHS is a measure of how often the RHS (consequent) is true when the LHS (consequent) is true when the LHS (antecedent) is true(antecedent) is true

► e.g. the rule: e.g. the rule: bread => milkbread => milk

has a confidence of 80% if 80% of the has a confidence of 80% if 80% of the purchases that include bread also include purchases that include bread also include milk. milk.

► A rule with low confidence is not meaningful.A rule with low confidence is not meaningful.

Page 14: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

ClusteringClustering

►Clustering is to group similar points Clustering is to group similar points together in a single set.together in a single set.

► In business, groups of customers who In business, groups of customers who has similar buying patterns.has similar buying patterns.

► In medicine, groups of patients who In medicine, groups of patients who shows similar reactions to prescribed shows similar reactions to prescribed drugs. drugs.

Page 15: Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover

ReferencesReferences

► A. Silberschatz, H.F. Korth, S. Sudershan: A. Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, Database System Concepts, 5th ed., 5th ed., McGraw-Hill, 2006McGraw-Hill, 2006

► R. Elmasri, S.B. Navathe: Fundamentals Of R. Elmasri, S.B. Navathe: Fundamentals Of Database Systems, 4Database Systems, 4thth ed., Addison Wesley, ed., Addison Wesley, 2003 2003