data mining david klein
DESCRIPTION
TRANSCRIPT
![Page 1: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/1.jpg)
Data Mining
David Klein & Adam Cogan
![Page 2: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/2.jpg)
Admin Stuff
• Attendance– You initial sheet
• Hands On Lab– You get me to initial sheet
• Certificate – At end of 10 sessions– If I say you have completed successfully
![Page 3: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/3.jpg)
About
• David Klein is a Senior Software Architect at SSW, specialising in .NET & SQL Server & BI solutions– Current Clients – Sally Knox Medical & Pisces
• Adam Cogan is Chief Architect at SSW and one of 2 Microsoft Regional Directors in Australia, specialising in Office, SQL and .NET solutions
![Page 4: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/4.jpg)
Course Overview
The 5 Sessions (Part B)
1. SSIS and Creating a Data Warehouse2. Creating a Cube and Cube Issues3. Reporting Services4. Other Cube Browsers5. Data Mining http://www.ssw.com.au/ssw/events/2006SQL/
![Page 5: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/5.jpg)
Session 5: Tonight’s Agenda
1. Why Data Mining?2. Uses3. Algorithms4. Implementation
– Sql Server Management Studio SSMS– Reporting Services– Sql Server Integration Services– DMX
5. Demo ?6. Hands on Lab
![Page 6: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/6.jpg)
Why Data Mining?
• Marketing– Who picks the movie? The kids, the wife, me.– Who are our Customers and what sort of films
do they hire?– Is a 30 year old woman with 2 children going to
hire Arnie’s latest film
• Validation– Is this data sensible? Terminator 2 and Toy
Story
• Prediction– Sales Next Year
![Page 7: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/7.jpg)
Complete Set Of Algorithms
Decision TreesDecision Trees ClusteringClustering Time SeriesTime Series
Sequence Sequence ClusteringClustering
AssociationAssociation Naïve BayesNaïve Bayes
Neural Neural NNetetss
![Page 8: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/8.jpg)
Naïve Bayes
• Quickly builds mining models that can be used for classification and prediction
• It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute– This can later be used to predict an outcome
of the predicted attribute based on the known input attributes
• This makes the model a good option for exploring the data
![Page 9: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/9.jpg)
Naïve Bayes – Toy Story 2
![Page 10: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/10.jpg)
Decision Trees (1)
• Decision Trees assign (classify) each case to one of a few (discrete) broad categories of selected attribute (variable) and explains the classification with few selected input variables
• The process of building is recursive partitioning – splitting data into partitions and then splitting it up more
• Initially all cases are in one big box
![Page 11: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/11.jpg)
Decision Trees (2)
• The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable– Several measures of purity
• Then it repeats splitting for each new class– Again testing all possible breaks
• Unuseful branches of the tree can be pre-pruned or post-pruned
![Page 12: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/12.jpg)
Decision Trees (3)
• Decision trees are used for classification and prediction
• Typical questions:– Predict which customers will leave– Help in mailing and promotion campaigns– Explain reasons for a decision– What are the movies young female customers
like to buy?
![Page 13: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/13.jpg)
Decision Trees – Who Decides
![Page 14: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/14.jpg)
Cluster Analysis (1)
• Grouping data into clusters– Objects within a cluster have high similarity based on
the attribute values
• The class label of each object is not known• Several techniques
– Partitioning methods– Hierarchical methods– Density based methods– Model based methods– And more…
![Page 15: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/15.jpg)
Cluster Analysis (2)
• Segments a heterogeneous population into a number of more homogenous subgroups or clusters
• Some typical questions:– Discover distinct groups of customers– Identification of groups of houses in a city– In biology, derive animal and plant taxonomies– Find outliers
![Page 16: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/16.jpg)
Conclusion: When To Use What
Analytical problem Examples Algorithms
Classification: Assign cases to predefined classes
Credit risk analysisChurn analysisCustomer retention
Decision TreesNaive BayesNeural Nets
Segmentation: Taxonomy for grouping similar cases
Customer profile analysisMailing campaign
ClusteringSequence Clustering
Association: Advanced counting for correlations
Market basket analysisAdvanced data exploration
Decision TreesAssociation
Time Series Forecasting: Predict the future
Forecast salesPredict stock prices
Time Series
Prediction: Predict a value for a new case based on values for similar cases
Quote insurance ratesPredict customer income
All
Deviation analysis: Discover how a case or segment differs from others
Credit card fraud detectionNetwork infusion analysis
All
![Page 17: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/17.jpg)
Summary
• Why Data Mining?• Uses• Algorithms• Implementation
– Sql Server Management Studio SSMS– Reporting Services– Sql Server Integration Services– DMX
• Demo ?• Hands on Lab
![Page 18: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/18.jpg)
Book
Data Mining with SQL Server 2005ZhaoHui Tang and Jamie MacLennanWiley Press
![Page 20: Data Mining David Klein](https://reader035.vdocuments.us/reader035/viewer/2022081518/5481c850b4af9f0b318b46b6/html5/thumbnails/20.jpg)
BI is Cool