building business intelligence and data mining applications with microsoft sql server 2005(1).pdf

Upload: alynutzza90

Post on 06-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    1/65

    Building Business Intelligence and

    Data Mining Applications withMicrosoft SQL Server 2005

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    2/65

    Introductions

    Presenter –

    Javier Loria – Solid Quality Learning – [email protected]

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    3/65

    AgendaOverview & BI Challenges

    Introducing the UDMThe UDM in Detail

    Data Mining Overview

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    4/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    5/65

    Business Intelligence Platform

    Data acquisitionData acquisi tionfrom sourcefrom sourcesystems andsystems andintegrationintegrationData transformationData transformationand synthesisand synthesis

    Data enrichment,Data enrichment,with businesswith businesslogic, hierarchicallogic, hierarchicalviewsviewsData discovery viaData discovery viadata miningdata mining

    Data presentationData presentationand distributionand distributionData access forData access forthe massesthe masses

    IntegrateIntegrate Analyze Analyze ReportReport

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    6/65

    OverviewGetting information from enterprise data

    Using BI across the enterprise as anintegral part of doing business

    Capture and model all of your dataIntegration with business processesRelational reporting and OLAP convergedthrough a single dimensional model

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    7/65

    Business Intelligence ChallengesMultiple Data Models

    Multiple Data SourcesMultiple APIs

    Duplication of Data

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    8/65

    What Is a Cube?

    Apples

    CherriesGrapes

    Melons

    Q4Q1 Q2 Q3Time Dimension

    Dallas

    Denver

    Chicago i m

    M a r k e

    t s D

    e n s

    i o n

    Atlanta

    P r o d

    u c t

    D i m

    e n s i o n

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    9/65

    What Is a Cube?

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    10/65

    Enterprise BI TodayEnterprise BI Today

    DW

    Datamart

    Datamart

    Data ModelsData Models ToolsTools

    ReportingReportingTool (3)Tool (3)

    MOLAP

    MOLAP

    ReportingReporting

    Tool (2)Tool (2)

    Data SourcesData Sources

    OLAPOLAPBrowser Browser

    ReportingReportingTool (1)Tool (1)

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    11/65

    Relational vs. OLAP ReportsRelational vs. OLAP Reports

    Feature Relational OLAPFlexible schema

    Real time data access

    Single data store

    Simple management

    Detail reporting

    Ease of navigation andexploration

    High performance

    End-user oriented

    Rich analytics

    Rich semantics

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    12/65

    AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges

    Introducing the UDMThe UDM in Detail

    Data Mining Overview

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    13/65

    Relational Reporting

    Multiple fact tablesFull richness thedimensions’ attributes

    Transaction level accessStar, snowflake, 3NF…Complex relationships

    Recursive self joinsSlowly changingdimensions

    The Unified Dimensional Model

    The Best of Relational and OLAP

    The Unified Dimensional Model

    The Best of Relational and OLAPOLAP Cubes

    Multidimensional navigationHierarchical presentationFriendly entity namesPowerful MDX calculations

    Central KPI frameworkMultiple perspectivesPartitions

    AggregationsDistributed sources

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    14/65

    UDM’s Role Allows the User Model to be Enriched

    Provides High Performance Queries Allows the Capture of Business Rules toSupport AnalysisSupports “Closing the Loop” Where theUser Acts Upon the Data

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    15/65

    Enterprise BI with UDM

    DW

    Datamart

    Datamart

    MOLAP

    MOLAP

    OLAPOLAPBrowser

    BI ApplicationsBI Applications

    Browser

    UDMUDM

    ReportingReportingToolTool

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    16/65

    Scalable, High PerformanceUDM Server

    DW

    Datamart

    Datamart

    MOLAP

    MOLAP

    OLAPOLAPBrowser

    BI ApplicationsBI Applications

    Browser

    UDMUDM

    Analysis AnalysisServicesServices

    X M L / A o r

    O L E D B

    / O L A P

    ReportingReportingToolTool

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    17/65

    Analysis Server as UDM Server Optimized SQL to all major RDBMSplatformsXML/A client API – SOAP-based Web service

    – API supported by all major BI vendorsManaged and native providers

    – ADOMD.NET – OLE DB for OLAP

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    18/65

    Streamlined BI InfrastructureUnified logical model for both relational andOLAP with superb performance andscalabilityOne data store to manage ensure dataconsistency and low TCORich user experience with many Microsoft

    and 3 rd-party tools

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    19/65

    BI Development Studio

    Complete, integrated tool for the

    development of BI applicationsEnterprise software developmentenvironmentIntegrated with Visual StudioTeam development, source control,versioning, developer isolation, resourceindependent coding

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    20/65

    PerformanceProactive caching – Automatic MOLAP cache creation and

    management

    MOLAP becomes transparent – No requirement to manage an OLAP store

    Relational reporting enjoys MOLAP-like

    performance

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    21/65

    MOLAP, ROLAP, and HOLAP

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    22/65

    MOLAP Caching

    X M L / A o r

    O D B O

    X M L / A o r

    O D B O

    DatamartDatamart

    DatamartDatamartBI ApplicationsBI Applications

    MOLAPMOLAP

    MOLAPMOLAP

    ToolToolData SourceData Source

    OLAPOLAPBrowserBrowser

    UDMUDM

    Analysis AnalysisServicesServices

    NotificationsNotifications

    DWDWCacheCache

    ReportingReportingToolTool

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    23/65

    AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges

    Introducing the UDMIntroducing the UDMIntroducing the UDMThe UDM in Detail

    Data Mining Overview

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    24/65

    UDM and The BI StudioUDM and The BI Studio

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    25/65

    UDM Data SourcesMultiple Data Sources

    – OLTP – OLAP

    – XML

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    26/65

    Data Source ViewsTables

    ViewsStored Queries

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    27/65

    Dimensions and HierarchiesDimensions Attribute-Based

    – Consolidates all attributes of an entityHierarchies Organize DataCustom hierarchies can be createdfrom attributes

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    28/65

    CubesNo More Limits

    – Limited only by addressable objects(2147483647)

    Stored as XMLLogical Grouping of Measures and

    Dimensions

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    29/65

    PerspectivesUDM Provides Subject Area Centric

    View of the Data WarehousePerspectives Feature Allows

    User/Group Specific View of the SameData

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    30/65

    CategorizationSemantically Meaningful Categories

    – Measures – Dimensions

    – Attributes – Hierarchies

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    31/65

    TimeUDM Has Built-In Knowledge of Time

    – Natural (Calendar) – Fiscal

    – Reporting – Manufacturing

    – ISO 8601

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    32/65

    TranslationsUDM provides for multiple languagesMetadata in BI Studio and Client ToolDisplayed in Multiple Languages

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    33/65

    Attribute SemanticsNames Vs. Keys

    OrderingDescretization

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    34/65

    Key Performance Indicators Actual Value

    Goal ValueStatus

    TrendGraphical Representation

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    35/65

    Closing the LoopIntegrated Data Mining

    Writeback – The UDM is not read-only

    Actions

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    36/65

    ProClarity Business Intelligence Analytics

    Selectorand

    KPI Designer(All Professional Clients)

    Business Logic Server

    Web Client Bundle(includes

    DashboardViewer)

    Dashboard Server

    Desktop Professional(Includes

    Business Reporter

    for Excel)

    OLAPCube

    OLAP

    Cube

    OLAPCube

    OLAPCube

    OLAPCube

    Web Professional(Includes

    Business Reporterfor Excel)

    Web Standard(zero footprint)

    Analytics Server

    Live Server

    Live Client(Excel based)

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    37/65

    ProClarity Key Differentiators

    Speed in decisions, real insightOne version of the truth Analysis PlatformProClarity + Microsoft; total BI platformSuper end-user friendly environment

    All users own informationSeveral visualizations for quickunderstandingPlatform total customizable

    Low Total Cost of Ownership & Flexible to implement

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    38/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    39/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    40/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    41/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    42/65

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    43/65

    AgendaOverview & BI ChallengesOverview & BI ChallengesOverview & BI Challenges

    Introducing the UDMIntroducing the UDMIntroducing the UDMThe UDM in DetailThe UDM in DetailThe UDM in Detail

    Data Mining Overview

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    44/65

    Data Mining Architecture

    Data Transform (SSIS)Data Transform (SSIS)

    CubeCubeCube

    NewDataset

    NewNewDatasetDataset

    ReportingReportingReporting

    Mining ModelsMining Models

    ModelBrowsing

    ModelModelBrowsingBrowsing

    PredictionPrediction

    CubeCubeCube

    LOB Application

    LOBLOB Application Appl ication

    WebWebWeb

    .NET..NETNETNativeNativeNative

    OperationsOperations(SSIS)(SSIS)

    HistoricalDataset

    HistoricalHistoricalDatasetDataset

    SQLSQLSQL

    OLE/DBOLE/DBOLE/DB

    Text FileText FileText File

    CRoss Industry Standard Process

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    45/65

    CRoss Industry Standard Process

    for Data Mining (CRISP)

    http://www.crisp-dm.org

    Microsoft Mining Model Algorithms

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    46/65

    Decision Trees Clustering Time SeriesIntroduced in SQL Server 2000

    Microsoft Mining Model Algorithms

    SequenceClustering

    Association Naïve Bayes

    Neural Net

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    47/65

    Microsoft Mining Models

    When To Use What

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    48/65

    When To Use What Analytical Problem Examples Algorithms

    Classification: Assign cases topredefined classes

    Credit risk analysisChurn analysisCustomer retention

    Decision TreesNaive BayesNeural Nets

    Segmentation: Taxonomy forgrouping similar cases

    Customer profi le analysisMailing campaign

    ClusteringSequence Clustering

    Association: Advanced countingfor correlations

    Market basket analysis Advanced data exploration

    Decision Trees Association

    Time Series Forecasting: Predictthe future

    Forecast salesPredict stock prices

    Time Series

    Prediction: Predict a value for anew case based on values for

    similar cases

    Quote insurance ratesPredict customer income

    All

    Deviation analysis: Discover howa case or segment differs fromothers

    Credit card fraud detectionNetwork infusion analysis

    All

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    49/65

    Thank You

    Javier Loría

    Business Intelligence,Solid Quality Learning [email protected]

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    50/65

    Decision Trees

    Classify each case to one of a few discrete

    broad categories of selected attributesThe process of building is recursivepartitioning – splitting data into partitionsand then splitting it up moreInitially all cases are in one big box

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    51/65

    Decision Trees (cont.)

    The algorithm tries all possible breaks in classesusing all possible values of each input attribute;it then selects the split that partitions data to thepurest classes of the searched variable

    – Several measures of purityThen it repeats splitting for each new class

    – Again testing all possible breaks

    Unuseful branches of the tree can bepre-pruned or post-pruned

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    52/65

    Decision Trees (cont.)Decision trees are used for classification andpredictionTypical questions: – Predict which customers will leave

    – Help in mailing and promotion campaigns – Explain reasons for a decision – What are the movies young female customers likely to

    buy?

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    53/65

    Naïve BayesClassification and Prediction ModelCalculates probabilities for each possiblestate of the input attribute given each stateof the predictable attribute

    Naïve Bayes (cont )

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    54/65

    Naïve Bayes (cont.)Used for classification – Assign new cases to predefined classes

    Some typical questions: – Categorize bank loan applications – Determining which home telephone lines

    are used for Internet access – Assigning customers to predefined

    segments

    – Quickly gathering basic comprehension

    Cluster Analysis

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    55/65

    Cluster AnalysisGrouping data into clusters

    – Objects within a cluster have high similarity

    based on the attribute valuesThe class label of each object is notknownSeveral techniques

    – Partitioning methods – Hierarchical methods – Density based methods

    – Model-based methods, more…

    Cluster Analysis (cont )

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    56/65

    Cluster Analysis (cont.)

    Segments a heterogeneous population

    into a number of more homogenoussubgroups or clusters

    Some typical questions: – Discover distinct groups of customers – Identify groups of houses in a city – In biology, derive animal and plant

    taxonomies

    S Cl i

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    57/65

    Sequence Clustering Analyzes sequence-oriented data thatcontains discrete-valued series – The sequence attribute in the series holds a

    set of events with a specific order that can be

    cosnsidered as a modelTypically used for Web customer analysis – Can be used for any other sequential data

    S Cl t i ( t )

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    58/65

    Sequence Clustering (cont.)Click-Stream Analysis

    User Sequence

    1 frontpage news travel travel2 news news news news news

    3 frontpage news frontpage news frontpage

    4 news news5 frontpage news news travel travel travel

    6 news weather weather weather weather

    7 news health health business business business8 frontpage sports sports sports weather

    9 weather

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    59/65

    Microsoft Mining Models

    Association Rules

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    60/65

    For market basket analyses – Identify cross-selling opportunities – Arrange attractive packages

    Considers each attribute/value pair as anitem

    An item set is a combination of items in asingle transaction

    The algorithm scans through the datasettrying to find item sets that tend to appear

    in many transactions

    Association Rules Support

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    61/65

    Association Rules – Support

    Support is the percentage of rowscontaining the item combination comparedto the total number of rows:Transaction 1: Frozen pizza, cola, milk

    Transaction 2: Milk, potato chips

    Transaction 3: Cola, frozen pizzaTransaction 4: Milk, pretzelsTransaction 5: Cola, pretzels

    The support for the rule “If a customerpurchases Cola, then they will purchaseFrozen Pizza” is 40%

    Association Rules – Confidence

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    62/65

    Association Rules – Confidence

    What if 60% of customers buy milk andonly 20% of those buy potato chips?The confidence of an association rule isthe support for the combination divided by

    the support for the conditionThis gives a confidence for a rule “If a

    customer purchases Milk, they willpurchase Potato Chips” of (20% / 60%) =33%

    Time Series

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    63/65

    Time SeriesPredict continuous columns, such asproduct sales or stock performance in aforecasting scenarioBuilds a model in two stages – First stage creates a list of optimal candidate

    input columns – Second stage investigates each candidate

    input column and determines if it improves themodel

    Neural Network

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    64/65

    Data modeling tool that is able to capture andrepresent complex input/output relationshipsNeural networks resemble the human brain inthe following two ways:

    – A neural network acquires knowledge through

    learning – A neural network's knowledge is stored within inter-

    neuron connection strengths known as synapticweights

    It explores all possible data relationships – It can be slow

    Back Propagation

  • 8/17/2019 Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005(1).pdf

    65/65

    Back-PropagationTraining a neural network is setting the bestweights on the inputs of each of the unitsThe back-propagation process: – Get a training example and calculate outputs

    – Calculate the error – the difference betweenthe calculated and the expected (known) result

    – Adjust the weights to minimize the error