building business intelligence and data mining applications with microsoft sql server 2005(1).pdf
TRANSCRIPT
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
1/65
Building Business Intelligence and
Data Mining Applications withMicrosoft SQL Server 2005
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
2/65
Introductions
Presenter –
Javier Loria – Solid Quality Learning – [email protected]
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
3/65
AgendaOverview & BI Challenges
Introducing the UDMThe UDM in Detail
Data Mining Overview
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
4/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
5/65
Business Intelligence Platform
Data acquisitionData acquisi tionfrom sourcefrom sourcesystems andsystems andintegrationintegrationData transformationData transformationand synthesisand synthesis
Data enrichment,Data enrichment,with businesswith businesslogic, hierarchicallogic, hierarchicalviewsviewsData discovery viaData discovery viadata miningdata mining
Data presentationData presentationand distributionand distributionData access forData access forthe massesthe masses
IntegrateIntegrate Analyze Analyze ReportReport
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
6/65
OverviewGetting information from enterprise data
Using BI across the enterprise as anintegral part of doing business
Capture and model all of your dataIntegration with business processesRelational reporting and OLAP convergedthrough a single dimensional model
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
7/65
Business Intelligence ChallengesMultiple Data Models
Multiple Data SourcesMultiple APIs
Duplication of Data
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
8/65
What Is a Cube?
Apples
CherriesGrapes
Melons
Q4Q1 Q2 Q3Time Dimension
Dallas
Denver
Chicago i m
M a r k e
t s D
e n s
i o n
Atlanta
P r o d
u c t
D i m
e n s i o n
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
9/65
What Is a Cube?
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
10/65
Enterprise BI TodayEnterprise BI Today
DW
Datamart
Datamart
Data ModelsData Models ToolsTools
ReportingReportingTool (3)Tool (3)
MOLAP
MOLAP
ReportingReporting
Tool (2)Tool (2)
Data SourcesData Sources
OLAPOLAPBrowser Browser
ReportingReportingTool (1)Tool (1)
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
11/65
Relational vs. OLAP ReportsRelational vs. OLAP Reports
Feature Relational OLAPFlexible schema
Real time data access
Single data store
Simple management
Detail reporting
Ease of navigation andexploration
High performance
End-user oriented
Rich analytics
Rich semantics
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
12/65
AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges
Introducing the UDMThe UDM in Detail
Data Mining Overview
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
13/65
Relational Reporting
Multiple fact tablesFull richness thedimensions’ attributes
Transaction level accessStar, snowflake, 3NF…Complex relationships
Recursive self joinsSlowly changingdimensions
The Unified Dimensional Model
The Best of Relational and OLAP
The Unified Dimensional Model
The Best of Relational and OLAPOLAP Cubes
Multidimensional navigationHierarchical presentationFriendly entity namesPowerful MDX calculations
Central KPI frameworkMultiple perspectivesPartitions
AggregationsDistributed sources
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
14/65
UDM’s Role Allows the User Model to be Enriched
Provides High Performance Queries Allows the Capture of Business Rules toSupport AnalysisSupports “Closing the Loop” Where theUser Acts Upon the Data
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
15/65
Enterprise BI with UDM
DW
Datamart
Datamart
MOLAP
MOLAP
OLAPOLAPBrowser
BI ApplicationsBI Applications
Browser
UDMUDM
ReportingReportingToolTool
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
16/65
Scalable, High PerformanceUDM Server
DW
Datamart
Datamart
MOLAP
MOLAP
OLAPOLAPBrowser
BI ApplicationsBI Applications
Browser
UDMUDM
Analysis AnalysisServicesServices
X M L / A o r
O L E D B
/ O L A P
ReportingReportingToolTool
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
17/65
Analysis Server as UDM Server Optimized SQL to all major RDBMSplatformsXML/A client API – SOAP-based Web service
– API supported by all major BI vendorsManaged and native providers
– ADOMD.NET – OLE DB for OLAP
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
18/65
Streamlined BI InfrastructureUnified logical model for both relational andOLAP with superb performance andscalabilityOne data store to manage ensure dataconsistency and low TCORich user experience with many Microsoft
and 3 rd-party tools
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
19/65
BI Development Studio
Complete, integrated tool for the
development of BI applicationsEnterprise software developmentenvironmentIntegrated with Visual StudioTeam development, source control,versioning, developer isolation, resourceindependent coding
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
20/65
PerformanceProactive caching – Automatic MOLAP cache creation and
management
MOLAP becomes transparent – No requirement to manage an OLAP store
Relational reporting enjoys MOLAP-like
performance
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
21/65
MOLAP, ROLAP, and HOLAP
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
22/65
MOLAP Caching
X M L / A o r
O D B O
X M L / A o r
O D B O
DatamartDatamart
DatamartDatamartBI ApplicationsBI Applications
MOLAPMOLAP
MOLAPMOLAP
ToolToolData SourceData Source
OLAPOLAPBrowserBrowser
UDMUDM
Analysis AnalysisServicesServices
NotificationsNotifications
DWDWCacheCache
ReportingReportingToolTool
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
23/65
AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges
Introducing the UDMIntroducing the UDMIntroducing the UDMThe UDM in Detail
Data Mining Overview
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
24/65
UDM and The BI StudioUDM and The BI Studio
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
25/65
UDM Data SourcesMultiple Data Sources
– OLTP – OLAP
– XML
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
26/65
Data Source ViewsTables
ViewsStored Queries
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
27/65
Dimensions and HierarchiesDimensions Attribute-Based
– Consolidates all attributes of an entityHierarchies Organize DataCustom hierarchies can be createdfrom attributes
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
28/65
CubesNo More Limits
– Limited only by addressable objects(2147483647)
Stored as XMLLogical Grouping of Measures and
Dimensions
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
29/65
PerspectivesUDM Provides Subject Area Centric
View of the Data WarehousePerspectives Feature Allows
User/Group Specific View of the SameData
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
30/65
CategorizationSemantically Meaningful Categories
– Measures – Dimensions
– Attributes – Hierarchies
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
31/65
TimeUDM Has Built-In Knowledge of Time
– Natural (Calendar) – Fiscal
– Reporting – Manufacturing
– ISO 8601
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
32/65
TranslationsUDM provides for multiple languagesMetadata in BI Studio and Client ToolDisplayed in Multiple Languages
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
33/65
Attribute SemanticsNames Vs. Keys
OrderingDescretization
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
34/65
Key Performance Indicators Actual Value
Goal ValueStatus
TrendGraphical Representation
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
35/65
Closing the LoopIntegrated Data Mining
Writeback – The UDM is not read-only
Actions
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
36/65
ProClarity Business Intelligence Analytics
Selectorand
KPI Designer(All Professional Clients)
Business Logic Server
Web Client Bundle(includes
DashboardViewer)
Dashboard Server
Desktop Professional(Includes
Business Reporter
for Excel)
OLAPCube
OLAP
Cube
OLAPCube
OLAPCube
OLAPCube
Web Professional(Includes
Business Reporterfor Excel)
Web Standard(zero footprint)
Analytics Server
Live Server
Live Client(Excel based)
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
37/65
ProClarity Key Differentiators
Speed in decisions, real insightOne version of the truth Analysis PlatformProClarity + Microsoft; total BI platformSuper end-user friendly environment
All users own informationSeveral visualizations for quickunderstandingPlatform total customizable
Low Total Cost of Ownership & Flexible to implement
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
38/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
39/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
40/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
41/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
42/65
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
43/65
AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges
Introducing the UDMIntroducing the UDMIntroducing the UDMThe UDM in DetailThe UDM in DetailThe UDM in Detail
Data Mining Overview
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
44/65
Data Mining Architecture
Data Transform (SSIS)Data Transform (SSIS)
CubeCubeCube
NewDataset
NewNewDatasetDataset
ReportingReportingReporting
Mining ModelsMining Models
ModelBrowsing
ModelModelBrowsingBrowsing
PredictionPrediction
CubeCubeCube
LOB Application
LOBLOB Application Appl ication
WebWebWeb
.NET..NETNETNativeNativeNative
OperationsOperations(SSIS)(SSIS)
HistoricalDataset
HistoricalHistoricalDatasetDataset
SQLSQLSQL
OLE/DBOLE/DBOLE/DB
Text FileText FileText File
CRoss Industry Standard Process
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
45/65
CRoss Industry Standard Process
for Data Mining (CRISP)
http://www.crisp-dm.org
Microsoft Mining Model Algorithms
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
46/65
Decision Trees Clustering Time SeriesIntroduced in SQL Server 2000
Microsoft Mining Model Algorithms
SequenceClustering
Association Naïve Bayes
Neural Net
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
47/65
Microsoft Mining Models
When To Use What
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
48/65
When To Use What Analytical Problem Examples Algorithms
Classification: Assign cases topredefined classes
Credit risk analysisChurn analysisCustomer retention
Decision TreesNaive BayesNeural Nets
Segmentation: Taxonomy forgrouping similar cases
Customer profi le analysisMailing campaign
ClusteringSequence Clustering
Association: Advanced countingfor correlations
Market basket analysis Advanced data exploration
Decision Trees Association
Time Series Forecasting: Predictthe future
Forecast salesPredict stock prices
Time Series
Prediction: Predict a value for anew case based on values for
similar cases
Quote insurance ratesPredict customer income
All
Deviation analysis: Discover howa case or segment differs fromothers
Credit card fraud detectionNetwork infusion analysis
All
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
49/65
Thank You
Javier Loría
Business Intelligence,Solid Quality Learning [email protected]
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
50/65
Decision Trees
Classify each case to one of a few discrete
broad categories of selected attributesThe process of building is recursivepartitioning – splitting data into partitionsand then splitting it up moreInitially all cases are in one big box
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
51/65
Decision Trees (cont.)
The algorithm tries all possible breaks in classesusing all possible values of each input attribute;it then selects the split that partitions data to thepurest classes of the searched variable
– Several measures of purityThen it repeats splitting for each new class
– Again testing all possible breaks
Unuseful branches of the tree can bepre-pruned or post-pruned
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
52/65
Decision Trees (cont.)Decision trees are used for classification andpredictionTypical questions: – Predict which customers will leave
– Help in mailing and promotion campaigns – Explain reasons for a decision – What are the movies young female customers likely to
buy?
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
53/65
Naïve BayesClassification and Prediction ModelCalculates probabilities for each possiblestate of the input attribute given each stateof the predictable attribute
Naïve Bayes (cont )
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
54/65
Naïve Bayes (cont.)Used for classification – Assign new cases to predefined classes
Some typical questions: – Categorize bank loan applications – Determining which home telephone lines
are used for Internet access – Assigning customers to predefined
segments
– Quickly gathering basic comprehension
Cluster Analysis
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
55/65
Cluster AnalysisGrouping data into clusters
– Objects within a cluster have high similarity
based on the attribute valuesThe class label of each object is notknownSeveral techniques
– Partitioning methods – Hierarchical methods – Density based methods
– Model-based methods, more…
Cluster Analysis (cont )
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
56/65
Cluster Analysis (cont.)
Segments a heterogeneous population
into a number of more homogenoussubgroups or clusters
Some typical questions: – Discover distinct groups of customers – Identify groups of houses in a city – In biology, derive animal and plant
taxonomies
S Cl i
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
57/65
Sequence Clustering Analyzes sequence-oriented data thatcontains discrete-valued series – The sequence attribute in the series holds a
set of events with a specific order that can be
cosnsidered as a modelTypically used for Web customer analysis – Can be used for any other sequential data
S Cl t i ( t )
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
58/65
Sequence Clustering (cont.)Click-Stream Analysis
User Sequence
1 frontpage news travel travel2 news news news news news
3 frontpage news frontpage news frontpage
4 news news5 frontpage news news travel travel travel
6 news weather weather weather weather
7 news health health business business business8 frontpage sports sports sports weather
9 weather
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
59/65
Microsoft Mining Models
Association Rules
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
60/65
For market basket analyses – Identify cross-selling opportunities – Arrange attractive packages
Considers each attribute/value pair as anitem
An item set is a combination of items in asingle transaction
The algorithm scans through the datasettrying to find item sets that tend to appear
in many transactions
Association Rules Support
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
61/65
Association Rules – Support
Support is the percentage of rowscontaining the item combination comparedto the total number of rows:Transaction 1: Frozen pizza, cola, milk
Transaction 2: Milk, potato chips
Transaction 3: Cola, frozen pizzaTransaction 4: Milk, pretzelsTransaction 5: Cola, pretzels
The support for the rule “If a customerpurchases Cola, then they will purchaseFrozen Pizza” is 40%
Association Rules – Confidence
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
62/65
Association Rules – Confidence
What if 60% of customers buy milk andonly 20% of those buy potato chips?The confidence of an association rule isthe support for the combination divided by
the support for the conditionThis gives a confidence for a rule “If a
customer purchases Milk, they willpurchase Potato Chips” of (20% / 60%) =33%
Time Series
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
63/65
Time SeriesPredict continuous columns, such asproduct sales or stock performance in aforecasting scenarioBuilds a model in two stages – First stage creates a list of optimal candidate
input columns – Second stage investigates each candidate
input column and determines if it improves themodel
Neural Network
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
64/65
Data modeling tool that is able to capture andrepresent complex input/output relationshipsNeural networks resemble the human brain inthe following two ways:
– A neural network acquires knowledge through
learning – A neural network's knowledge is stored within inter-
neuron connection strengths known as synapticweights
It explores all possible data relationships – It can be slow
Back Propagation
-
8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf
65/65
Back-PropagationTraining a neural network is setting the bestweights on the inputs of each of the unitsThe back-propagation process: – Get a training example and calculate outputs
– Calculate the error – the difference betweenthe calculated and the expected (known) result
– Adjust the weights to minimize the error