forecasting with the microsoft time series data mining algorithm

52

Upload: hoangdat

Post on 17-Jan-2017

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecasting with the Microsoft Time Series Data Mining Algorithm
Page 2: Forecasting with the Microsoft Time Series Data Mining Algorithm

Forecasting with the Microsoft Time Series Data Mining AlgorithmPeter MyersBitwise Solutions Pty Ltd

DBI-B326

Page 3: Forecasting with the Microsoft Time Series Data Mining Algorithm

Presenter IntroductionPeter MyersBI Expert, Bitwise Solutions Pty LtdBBus, SQL Server MCSE, MCT, SQL Server MVP (since 2007)Experienced in designing, developing and maintaining Microsoft database and application solutions, since 1997Focuses on education and mentoringBased in Melbourne, [email protected]://www.linkedin.com/in/peterjsmyers

Page 4: Forecasting with the Microsoft Time Series Data Mining Algorithm

Session OutlineDescribing the Data Mining ProcessIntroducing the Microsoft Time Series AlgorithmIncluding demonstrations to produce forecasts by using:

Table Analysis Tools Excel Add-InClient Tools Excel Add-InSQL Server Data Tools (SSDT) ProjectScript (DMX).NET Programming

Tuning a Time Series Mining ModelDescribing Business Scenarios

Page 5: Forecasting with the Microsoft Time Series Data Mining Algorithm

The History of Forecasting

Page 6: Forecasting with the Microsoft Time Series Data Mining Algorithm

The History of Forecasting

Page 7: Forecasting with the Microsoft Time Series Data Mining Algorithm
Page 8: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series ForecastingThe development of data mining models to predict future values based on previously observed values

Page 9: Forecasting with the Microsoft Time Series Data Mining Algorithm
Page 10: Forecasting with the Microsoft Time Series Data Mining Algorithm

Describing the Data Mining Process

Page 11: Forecasting with the Microsoft Time Series Data Mining Algorithm

Describing the Data Mining ProcessDesign ► Process ► Query

Mining Model

Page 12: Forecasting with the Microsoft Time Series Data Mining Algorithm

Describing the Data Mining ProcessDesign ► Process ► Query

Mining Model

Training Data

Data Mining Engine

Page 13: Forecasting with the Microsoft Time Series Data Mining Algorithm

Describing the Data Mining ProcessDesign ► Process ► Query

Mining Model

Data Mining Engine

Data to Predict

Predicted Data

Page 14: Forecasting with the Microsoft Time Series Data Mining Algorithm

Introducing the Microsoft Time Series Algorithm

Page 15: Forecasting with the Microsoft Time Series Data Mining Algorithm

SQL Server 2012 Analysis Services

Data Modeling (BISM)Delivers intuitive, high-performance access to dataEncapsulates calculations

Data MiningIncludes a suite of nine algorithmsDiscovers patterns in dataEnables knowledge discovery and predictive analytics

Page 16: Forecasting with the Microsoft Time Series Data Mining Algorithm

Microsoft Time Series AlgorithmFirst delivered in Analysis Services 2005Provides the blending of two machine learning algorithms for forecasting continuous values (stock prices, sales, temperatures etc.)ARTxpARIMA (since Analysis Services 2008)

The algorithm detects trends and can use them to predict future steps in a time series

Page 17: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

Simple Forecasting by Using the Table Analysis Tools Excel Add-In

Page 18: Forecasting with the Microsoft Time Series Data Mining Algorithm

Microsoft Time Series AlgorithmARTxp Algorithm

AutoRegressive Tree (ART) with cross-series prediction (xp)Cross-series prediction predicts the outcome of one series based on the behaviour of anotherFor example, ice cream sales and temperatureOnly enabled in Business Intelligence and Enterprise editions

Developed by, and unique to, MicrosoftBased on the Microsoft Decision Trees algorithmConsidered the most accurate algorithm for predicting the next likely value(s) in a series

Page 19: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

Exploring Different ARTxp Scenarios by Using the Data Mining Client Excel Add-In

Page 20: Forecasting with the Microsoft Time Series Data Mining Algorithm

Microsoft Time Series AlgorithmARIMA Algorithm

AutoRegressive Integrated Moving AveragesIntroduced in Analysis Services 2008An industry-standard forecasting algorithmConsidered a more accurate and stable algorithm for long-term predictionDoes not support cross-series prediction

Page 21: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model DefinitionEach Time Series mining model must contain:One key time column

Specifies the time slicesCan be any data type – typically numeric

Predictable column(s)Must contain continuous numeric values (no missing data points)Define the model columns as INPUT, or PREDICT to allow cross-series predictionIn Standard Edition, all columns must be defined as PREDICT_ONLY

An optional series key column can be included for interleaved formatEnsure training data is clean, representative and ordered chronologicallyTime series data must not have missing data points or values

Page 22: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series DataColumnar Format

The key time and data series are columnsEach row represents a distinct time sliceAdvantages:Easy to produce and interpretAllows specific column types

Disadvantage:Must add a new column to introduce a new series

Key time column

Predictable columns

Page 23: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series DataInterleaved Format

Consists of unique time and series combinationsAdvantages:Series names come directly from dataCan easily add new series

Disadvantages:More difficult to produce and interpretUses same column types

Key time column Data value

Series

Page 24: Forecasting with the Microsoft Time Series Data Mining Algorithm

Characteristics of Time Series ModelsTime Series mining models differ from other types of mining models:Can be trained by using small data setsTraining data must be loaded in chronologic orderModel life time can be ephemeral (short-lived)

Predicted values become obsolete once that time period has occurredNote: Mining models can be created as session models

Models may require reprocessing as new time period data becomes availableModels must be configured with DRILLTHROUGH enabled (certain metadata is sourced by drillthrough)Accuracy testing cannot involve hold out data

Page 25: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingForecasting is achieved with the PredictTimeSeries function

There are many overloaded formsThe function returns a nested table (hierarchical rowset), which can be flattened to return a table structureEach table row represents a predicted step

PredictTimeSeries(<table column reference>, n)

Page 26: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingExamples

Using columnar format:

Using interleaved format:

SELECT FLATTENED PredictTimeSeries([MSFT], 3) AS ForecastFROM [StockForecast]

SELECT FLATTENED PredictTimeSeries([Price], 3) AS ForecastFROM [StockForecastInterleaved]WHERE [Series] = 'MSFT'

Page 27: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

Querying a Mining Model by Using Excel

Page 28: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingAdditional Scenarios

The PredictTimeSeries function can start from a point before the last mining model case

A prediction join can be used to pass in additional data, to replace or extend mining model casesIntroduced in Analysis Services 2008Available only in Business Intelligence and Enterprise editions

SELECT FLATTENED PredictTimeSeries([MSFT], -6, 0) AS ForecastFROM [StockForecast]

Page 29: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingAdditional Scenarios (Continued)

REPLACE_MODEL_CASESSpecifies that the data points in the mining model should be replaced with new dataHowever, predictions are still based on the patterns in the existing mining model

EXTEND_MODEL_CASESSpecifies that the new data should be added to the original training data setFuture predictions are made on the composite data set only after the new data has been usedCan be used to supplement series data to execute what-if scenarios

Page 30: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingAdditional Scenarios: Replacing Cases

What would future sales be if

applied to a different

data series?

Page 31: Forecasting with the Microsoft Time Series Data Mining Algorithm

Time Series Model QueryingAdditional Scenarios: Extending Cases

If we sold less in the next months, how would that impact on future

sales?

Page 32: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

Querying a Mining Model by Using SQL Server Management Studio

Page 33: Forecasting with the Microsoft Time Series Data Mining Algorithm

Tuning a Time Series Mining Model

Page 34: Forecasting with the Microsoft Time Series Data Mining Algorithm

Determining Prediction AccuracyAdditional data mining functions can be used to determine the quality of the prediction:PredictVariancePredictStdev (square root of variance)

SELECT FLATTENED(SELECT *, PredictStDev() AS Stdev, PredictVariance() AS Var FROM PredictTimeSeries([MSFT], 3)) AS Forecast

FROM [StockForecast]

Page 35: Forecasting with the Microsoft Time Series Data Mining Algorithm

Determining Prediction Accuracy(Continued)

The larger the deviations, the less reliable the mining model is for forecastingThere is a 68% chance that the predicted value lies within 1 standard deviation of the predictionAnd, a 95% chance it lies within 2 standard deviations

High deviations may be addressed by tuning the mining model by configuring the algorithm parameters

Page 36: Forecasting with the Microsoft Time Series Data Mining Algorithm

Algorithm ParametersMISSING_VALUE_SUBSTITUTIONOnce a series commences there should be no missing valuesPREVIOUS, MEAN or a constant value can be used to compute any missing values

PERIODICITY_HINTWhere known cycles exist in a series, declare them to the model as a set: {12, 3}Periodicity is expressed as the number of time slices per cycleThe Time Series algorithm is very sensitive to this parameter

Page 37: Forecasting with the Microsoft Time Series Data Mining Algorithm

Algorithm Parameters(Continued)

FORECAST_METHODDetermines which algorithm is used:

MIXED (default), ARTXP or ARIMASetting to ARTXP will simulate Analysis Services 2005 behaviorEnabled only in Business Intelligence and Enterprise editions

PREDICTION_SMOOTHINGControls the balance of the blend between ARTxp and ARIMAOnly comes into effect if FORECAST_METHOD is set to MIXEDCloser to 0 means more ARTxp; closer to 1 means more ARIMAEnabled only in Business Intelligence and Enterprise editions

Page 38: Forecasting with the Microsoft Time Series Data Mining Algorithm

Algorithm ParametersPREDICTION_SMOOTHING Scenarios

PREDICTION_SMOOTHING set to 0.5Equal blending

PREDICTION_SMOOTHING set to 0.2For step 0, the model weights ARIMA as 0.2 and ARTXP as 0.8Thereafter, the weight of ARIMA exponentially increases and the weight of ARTxp exponentially decreases

Page 39: Forecasting with the Microsoft Time Series Data Mining Algorithm

Algorithm Parameters(Continued)

INSTABILITY_SENSITIVITYOnly valid when FORECAST_METHOD is set to ARTXPValues closer to 0 will reduce the impact of instability detectionThe default of 1 will simulate Analysis Services 2005 behaviorEnabled only in Business Intelligence and Enterprise editionsMINIMUM_SERIES_VALUEMAXIMUM_SERIES_VALUEAllows setting limits for the range of valid predictionsEnabled only in Business Intelligence and Enterprise editions

Page 40: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

End-to-end Forecasting by Using SSDT and DMX

Page 41: Forecasting with the Microsoft Time Series Data Mining Algorithm

Describing Business Scenarios

Page 42: Forecasting with the Microsoft Time Series Data Mining Algorithm

Business Scenarios

Consider any scenario involving the need to know what is likely to happenExtend trend analysis with “what could be” scenariosSeed planning data based on forecastsEmbed forecasts into applicationsMicrosoft data mining visualizations can be embedded directly into .NET Windows Forms applicationsApplications, or SQL CLR objects, can create and/or query mining models by using ADOMD.NET

Page 43: Forecasting with the Microsoft Time Series Data Mining Algorithm

Business Scenarios(Continued)

For the IT Professional:Mine server performance counters to gauge when hardware upgrades may be requiredMine ETL statistics to better understand the future growth of the data warehouse, and disk space requirements

Page 44: Forecasting with the Microsoft Time Series Data Mining Algorithm

Demo

Forecasting by Using a SQL CLR Function

Page 45: Forecasting with the Microsoft Time Series Data Mining Algorithm

SummaryThe Time Series algorithm is used for forecastingChoose the tool that fits your skills and requirements:Excel add-insSSDTScript with DMX.NET programming

Algorithm parameters can be configured to tune the mining model for improved accuracy and reliabilityDevelopers can embed forecasts into applications

Page 46: Forecasting with the Microsoft Time Series Data Mining Algorithm

ResourcesSQL Server Predictive Analytics web sitehttp://www.microsoft.com/en-us/sqlserver/solutions-technologies/business-intelligence/predictive-analytics.aspxLinks to related documents, related sites, and community resources

MSDN Analysis Services Data Mining Documentationhttp://msdn.microsoft.com/library/bb510517.aspx

SQL Server Data Mining Community web sitehttp://www.sqlserverdatamining.comSite designed and maintained by the SQL Server Data Mining engineering teamIncludes live samples, tutorials, webcasts, tips and tricks, and FAQ

Page 47: Forecasting with the Microsoft Time Series Data Mining Algorithm

Resources(Continued)

TechNet: Implementing Smart Reports with the Microsoft Business Intelligence Platformhttp://technet.microsoft.com/library/cc966451.aspxDescribes how to develop a report that uses a session data mining model to forecast future sales with a SQL CLR stored procedure

Book: “Data Mining for SQL Server 2008”Publisher: WileyAuthors: , by Jamie MacLennan, ZhaoHui Tang and Bogdan Crivat

Paul the Octopushttp://en.wikipedia.org/wiki/Paul_the_Octopus

Page 48: Forecasting with the Microsoft Time Series Data Mining Algorithm

Track resourcesDownload Microsoft SQL Server 2014

http://www.trySQLSever.com

Try out Power BI for Office 365! http://www.powerbi.com

Sign up for Microsoft HDInsight today! http://microsoft.com/bigdata

Page 49: Forecasting with the Microsoft Time Series Data Mining Algorithm

ResourcesLearning

Microsoft Certification & Training Resourceswww.microsoft.com/learning

msdnResources for Developers

http://microsoft.com/msdn

TechNetResources for IT Professionals

http://microsoft.com/technet

Sessions on Demandhttp://channel9.msdn.com/Events/TechEd

Page 50: Forecasting with the Microsoft Time Series Data Mining Algorithm

Complete an evaluation and enter to win!

Page 51: Forecasting with the Microsoft Time Series Data Mining Algorithm

Evaluate this session

Scan this QR code to evaluate this session

Page 52: Forecasting with the Microsoft Time Series Data Mining Algorithm

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.