september 2014 dan steinberg mikhail golovnya salford systems salford systems ©2014 introduction to...
TRANSCRIPT
1
September 2014Dan Steinberg
Mikhail GolovnyaSalford Systems
Salford Systems ©2014 Introduction to Modern Regression
Introduction to Modern Regression:
From OLS to GPS® to MARS®
2
Course Outline
• Regression Problem – quick overview• Classical OLS/LOGIT – the starting point• RIDGE/LASSO/GPS – regularized regression• MARS – adaptive non-linear regression
• Regression Trees• GBM – stochastic gradient boosting• ISLE/RULE-LEARNER – model compression
Salford Systems ©2014 Introduction to Modern Regression
Today’s Topics
Next Week Topics
3
Regression• Regression analysis at least 200 years old
• Surely most used predictive modeling technique (including logistic regression)
• American Statistical Association reports 18,900 members
• Bureau of Labor Statistics reported more than 22,000 statisticians in the work force in 2008 in the USA
• Many other professionals involved in the sophisticated analysis of data not included in these countso Statistical specialists in scientific disciplines such as economics, medicine
bioinformaticso Machine Learning specialists, ‘Data Scientists’, database expertso Market researchers studying traditional targeted marketingo Web analytics, social media analytics, text analytics
• Few of these other researchers will call themselves statisticians but may make extensive use of variations of regression
Salford Systems ©2014 Introduction to Modern Regression
4
Regression Challenges• Preparation of data – errors, missing values, etc.• Determination of predictors to include in model
o Hundreds, thousands, even tens and hundreds of thousands available
• Transformation or coding of predictorso Conventional approaches consider logarithm, power, inverse, etc..
• Detecting and modeling important interactions• Possibly huge number of records
o Super large samples render all predictors “significant”
• Complexity of underlying relationship• Lack of external knowledge
Salford Systems ©2014 Introduction to Modern Regression
5
OLS Regression• OLS – ordinary least squares regression
o Discovered by Legendre (1805) and Gauss (1809) to solve problems in astronomy using pen and paper
o Solid statistical foundation by Fisher in 1920so 1950s – use of electro-mechanical calculators
• The model is always of the form
• The response surface is a hyper-plane!• A – the intercept term• B1, B2, B3, … – parameter estimates
• A usually unique combination of values exists which minimizes the mean squared error of predictions on the learn sample
• Step-wise approaches to determine model size
Response = A + B1X1 + B2X2 + B3X3 + …
Salford Systems ©2014 Introduction to Modern Regression
6
Logistic Regression• Models Bernoulli response (binary outcome)• The log-odds of the probability of the event of
interest is a linear combination of predictors
• The solution minimizes the loss (or equivalently maximizes the logistic likelihood) on the available data (assuming the response is coded as +1 and -1)
• The solution has no closed form and is usually found by series of iterations utilizing Newton’s algorithm
Salford Systems ©2014 Introduction to Modern Regression
F(X) = Log[p/(1-p)] = A + B1X1 + B2X2 + B3X3 + …
Loss = i=1,N {Log[1 + exp(-Yi Fi)]}
7
Key Problems with OLS and
LOGIT• In addition to the challenges already mentioned
above, note the following features:o OLS optimizes specific loss function (Mean Squared Error / Log-
likelihood) using all available data (learn sample)o Solution over-fits to the learn sampleo Solution becomes unstable in the presence of collinearityo Unique solution does not exist when data becomes wide
• Alternative strategies to construct useful linear combinations of predictorso By jointly optimizing MSE and L2-norm of the coefficients – Ridge regressiono By jointly optimizing MSE and L1-norm of the coefficients – Lasso regressiono Hybrid of the two aboveo All of the above plus extensions into very sparse solutions – Generalized
Path Seeker
Look for
Salford Systems ©2014 Introduction to Modern Regression
8
Motivation for Regularized
Regression• Unsatisfactory regression results based on data
modeling physical processes (1970) when predictors correlatedo Coefficient estimates could change dramatically with small changes in datao Some coefficients judged to be much too largeo Frequent appearance of coefficients with “wrong” signo Problem severe with substantial multicollinearity but always present
• Solution proposed by Hoerl and Kennard, (Technometrics, 1970) was “Ridge regression”o First proposal by Hoerl just for stabilization of coefficients 1962o Initially poorly received by academic statistics profession
• Ridge Intentionally biased but yields more satisfactory coefficient estimates and superior generalizationo Better performance (test MSE) on previously unseen data (lower variance)
• “Shrinkage” of regression coefficients towards zero• OLS coefficient vector length is biased upwards
Salford Systems ©2014 Introduction to Modern Regression
9
Lasso Regularized Regression
• Introduced by Tibshirani in 1996 explicitly as an improvement on the RIDGE regression
• Least Absolute Shrinkage and Selection Operator• Desire to gain the stability and lower variance of ridge
regression while also performing variable selection• Especially in the context of many possible predictors
looking for a simple, stable, low predictive variance model
• Historical note: The Lasso was inspired by related work in 1993 by Leo Breiman (of CART and RandomForests fame). This was the ‘non-negative garotte’.
• Breiman’s simulation studies showed the potential for improved prediction via selection and shrinkage
Salford Systems ©2014 Introduction to Modern Regression
10
Regularized Regression - Theory
• Regularized regression approach balances model performance and model complexity
• λ – regularization parametero λ = ∞ Zero-coefficient solutiono λ = 0 OLS solution
• Model complexity expression defines type of modelSalford Systems ©2014 Introduction to Modern Regression
Mean Squared Error Model Complexity
OLS Regression
Minimize
Minimize
Regularized Regression
Ridge: Sum of squared
coefficients
Lasso: Sum of absolute
coefficients
Best Subsets: Number of coefficients
λ
11
Penalty Function – Model
Complexity• RIDGE penalty Saj
2 squared
• LASSO penalty |S aj| absolute value
• COMPACT penalty |S aj|0 count
• Extended power family |S aj| g 0<= <=2g
• Elastic net family S {(β - 1)a2j/2 + (2 - β)|aj|},1 ≤ β ≤ 2
• Regularization parameter λ multiplies a penalty function of the a vector
• Each elasticity is based on fitting a model that minimizes the sum (residual sum of squared errors + penalty)
• Intermediate elasticities are mixtures, e.g. we could have a 50/50 mix of RIDGE and LASSO ( = 1.5)g
• GPS extends beyond the “elastic net” of Tibshirani/Hastie to include the “other half” of the power family 0 ≤ β ≤ 1
Salford Systems ©2014 Introduction to Modern Regression
12
Lambda-Centric Approach• A number of regularized regression algorithms
were developed to find the value of the coefficients for the given lambdao Ridge regression: a(ridge) = (X’X +l I)-1X’y o LARS algorithm and its modification to obtain LASSO regressiono Elastic Net algorithm to generate a family of solutions “between”
LASSO and RIDGE regressions that closely approximate the power family
Pβ(a) = ∑ {(β - 1)a2j/2 + (2 - β)|aj|}, 1 ≤ β ≤ 2
Ridge β = 2.0Lasso β = 1.0
• All of these approaches are lambda-centric! Pick any combination of beta and lambda (usually on a user-defined grid) and then solve for the coefficients
Salford Systems ©2014 Introduction to Modern Regression
13
Path-Centric Approach• Instead of focusing on the lambda, observe that
solutions traverse a path in the parameter space• The terminating points of the path are always
known:o OLS solution corresponding to λ = 0o Zero-coefficient solution corresponding to λ = ∞
• Start at either of the terminating points and work out an update algorithm to find the next point along the patho Starting at the OLS end requires finding the OLS solution first, which may
be time consumingo Starting at the zero end is more convenient and allows early path
termination upon reaching excessive compute burden or acceptable model performance
• Every point on the path maps (implicitly) to a monotone sequence of lambdas
Salford Systems ©2014 Introduction to Modern Regression
14
Regularized Regression – Practical
Algorithm
• Start with the zero-coefficient solution• Series of iterations
o Update one of the coefficients by a small amount• If the selected coefficient was zero, a new variable effectively enters into the model• If the selected coefficient was not zero, the model is simply updated
• The end result is a collection of linear models which can be visualized as a path in the parameter space
Salford Systems ©2014 Introduction to Modern Regression
CurrentModel
X1 0.0
X2 0.0
X3 0.2
X4 0.0
X5 0.4
X6 0.5
X7 0.0
X8 0.0
X1 0.0
X2 0.0
X3 0.2
X4 0.1
X5 0.4
X6 0.5
X7 0.0
X8 0.0
Introducing New Variable
NextModel
CurrentModel
X1 0.0
X2 0.0
X3 0.2
X4 0.0
X5 0.4
X6 0.5
X7 0.0
X8 0.0
X1 0.0
X2 0.0
X3 0.3
X4 0.1
X5 0.4
X6 0.5
X7 0.0
X8 0.0
Updating Existing Model
NextModel
15
Path Building Process
• Elasticity Parameter – controls the variable selection strategy along the path (using the LEARN sample only), it can be between 0 and 2, inclusiveo Elasticity = 2 – fast approximation of Ridge Regression, introduces
variables as quickly as possible and then jointly varies the magnitude of coefficients – lowest degree of compression
o Elasticity = 1 – fast approximation of Lasso Regression, introduces variables sparingly letting the current active variables develop their coefficients – good degree of compression versus accuracy
o Elasticity = 0 – fast approximation of Stepwise Regression, introduces new variables only after the current active variables were fully developed – excellent degree of compression but may loose accuracy
ZeroCoefficient
Model
A Variableis Added
Sequence of
1-variablemodels A Variable
is Added
Sequence of
2-variablemodels
A Variableis Added
Sequence of
3-variablemodels
FinalOLS
Solution
Variable Selection Strategy
Salford Systems ©2014 Introduction to Modern Regression
λ = ∞ λ = 0…
16
Points Versus Steps
• Each path will have different number of steps• To facilitate model comparison among different
paths, the Point Selection Strategy extracts a fixed collection of models into the points grido This eliminates some of the original irregularity among individual paths
and facilitates model extraction and comparison
Path 2: Steps OLSSolution
Points
Path 1Path 2Path 3
ZeroSolution
Path 1: Steps
Path 3: Steps
Point Selection Strategy
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Salford Systems ©2014 Introduction to Modern Regression
17
OLS versus GPS
• GPS (Generalized Path Seeker) introduced by Jerome Friedman in 2008 (Fast Sparse Regression and Classification)
• Dramatically expands the pool of potential linear models by including different sets of variables in addition to varying the magnitude of coefficients
• The optimal model of any desirable size can then be selected based on its performance on the TEST sample
Learn Sample
OLS Regression
X1, X2 , X3, X4, X5, X6,…
Test Sample
X1, X2 , X3, X4, X5, X6,…
A Sequence of Linear Models1-variable model2-variable model3-variable model…Optim
ize Ignore
GPS Regression
Large Collection of Linear Models (Paths)1-variable models, varying coefficients2-variable models, varying coefficients3-variable models, varying coefficients…
Learn From Select Optim
al
Salford Systems ©2014 Introduction to Modern Regression
18
Boston Housing Data Set• Concerns the housing values in Boston area
• Harrison, D. and D. Rubinfeld. Hedonic Prices and the Demand For Clean Air. Journal of Environmental Economics and Management, v5, 81-102 , 1978
• Combined information from 10 separate governmental and educational sources to produce this data set
•506 census tracts in City of Boston for the year 1970o Goal: study relationship between quality of life variables and property valueso MV median value of owner-occupied homes in tract ($1,000’s)o CRIM per capita crime rateso NOX concentration of nitric oxides (pp 10 million)o AGE percent built before 1940o DIS weighted distance to centers of employmento RM average number of rooms per houseo LSTAT % lower status of the populationo RAD accessibility to radial highwayso CHAS borders Charles River (0/1)o INDUS percent non-retail businesso TAX property tax rate per $10,000o PT pupil teacher ratio
Salford Systems ©2014 Introduction to Modern Regression
19
OLS on Boston Data
• 414 records in the learn sample
• 92 records in the test sample
• Good agreemento LEARN MSE = 27.455o TEST MSE = 26.147
3-variable Solution
-0.597 +5.247
-0.858
Salford Systems ©2014 Introduction to Modern Regression
20
Paths Produced by GPS
• Example of 21 paths with different variable selection strategies
Salford Systems ©2014 Introduction to Modern Regression
21
Path Points on Boston Data
• Each path uses a different variable selection strategy and separate coefficient updates
Point 30 Point 100 Point 150 Point 190
Path Development
Salford Systems ©2014 Introduction to Modern Regression
22
GPS on Boston Data3-variable Solution
• 414 records in the learn sample
• 92 records in the test sample• 15% performance
improvement on the test sampleo TEST MSE = 22.669
(OLS MSE was 26.147)
+5.247
-0.858
-0.597
OLS26.147
Salford Systems ©2014 Introduction to Modern Regression
23
Key Problems with GPS
Salford Systems ©2014 Introduction to Modern Regression
Look for
• GPS model is still a linear regression!• Response surface is still a global hyper-plane• Incapable of discovering local structure in the
data
• Develop non-linear algorithms that build response surface locally based on the data itselfo By trying all possible data cuts as local boundarieso By fitting first-order adaptive splines locally
24
Motivation for MARSMultivariate Adaptive Regression
Splines
• Developed by Jerome H. Friedman in late 1980s• After his work on CART (for classification)• Adapting many CART ideas for regression
o Automatic variable selectiono Automatic missing value handlingo Allow for nonlinearityo Allow for interactionso Leverage the power of regression where
linearity can be exploited
Salford Systems ©2014 Introduction to Modern Regression
25
From Linear to Non-linear
• Classical regression and regularized regression builds globally linear models
• Further accuracy can be achieved by building locally linear models nicely connected to each other at the boundary points called knots
-100
102030405060
0 10 20 30 40LSTAT
MV
0
10
20
30
40
50
60
0 10 20 30 40
LSTAT
MV
Localize
Knots
Salford Systems ©2014 Introduction to Modern Regression
26Salford Systems ©2014 Introduction to Modern Regression
Key Concept for Spline is the
“knot”• Knot marks end of one region of data and beginning of another
• Knot is where behavior of function changes
• In a classical spline knot positions are predetermined and are often evenly spaced
• In MARS, knots are determined by search procedure
• Only as many knots as needed end up in the MARS model
• If a straight line is adequate fit there will be no interior knotso in MARS there is always at least one knot
o Could correspond to smallest observed value of the predictor
27Salford Systems ©2014 Introduction to Modern Regression
Placement of Knots• With only one predictor and one knot to
select, placement is straightforward:o test every possible knot locationo choose model with best fit (smallest SSE)o perhaps constrain by requiring a minimum
amount of data in each interval• Prevents interior knot being placed too close to a
boundary
• For computational efficiency, knots are always placed exactly at observed predictor valueso Can cause rare modeling artifacts that sometimes
appear due to discrete nature of data
28
Finding Knots Automatically
• Stage-wise knot placement process on a flat-top function
0
20
40
60
80
0 30 60 90X
Y
0
20
40
60
80
0 30 60 90X
Y
True Knots Knot 1 Knot 2 Knot 3
Knot 4 Knot 5 Knot 6
Salford Systems ©2014 Introduction to Modern Regression
29Salford Systems ©2014 Introduction to Modern Regression
Basis Functions• Example for knot selection works very well to illustrate
splines in one dimension
• Thinking in terms of knot locations is unwieldy for working with a large number of variables simultaneouslyo need a concise notation and programming expressions that are
easy to manipulate
o Not clear how to construct or represent interactions using knot locations
• Basis Functions (BF) provide analytical machinery to express the knot placement strategy
• MARS creates sets of basis functions to decompose the information in each variable individually
30Salford Systems ©2014 Introduction to Modern Regression
The Hockey Stick Basis Function• Hockey Stick basis function is core MARS
buildingo can be applied to a single variable multiple times
• Hockey stick function:o Direct: max (0, X -c)
o Mirror: max (0, c - X)
o Maps variable X to new variable X*
o X* =0 for all values of X up to some threshold value c
o X* =X – c for all values of X greater than c
• the amount by which X exceeds threshold c
31Salford Systems ©2014 Introduction to Modern Regression
Basis Function Example• X ranges from 0 to 100• 8 basis functions displayed (c=10,20,30,…,80)
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100 110
X
Val
ue
BF10
BF20
BF30
BF40
BF50
BF60
BF70
BF80
32Salford Systems ©2014 Introduction to Modern Regression
Basis Functions: Separate Displayso Each function is graphed with same dimensions
o BF10 is offset from original value by 10o BF80 is zero for most of its rangeo Basis functions can be constructed for any value of co MARS considers constructing one for EVERY actual data
value
33Salford Systems ©2014 Introduction to Modern Regression
Tabular Display of Basis Functions• Each new BF results in a different number of zeroes in
transformed variable
• The resulting collection is resistant to multicollinearity issues
• Three basis functions with knots at 25, 55, and 65:
Spline with 1 Basis FunctionMV = 27.395 -
0.659*(INDUS -4)+
0
10
20
30
40
0 5 10 15 20 25 30
INDUS
MV
Slope = 0 Knot = 4Slope = -0.659
Salford Systems ©2014 Introduction to Modern Regression 34
Spline with 2 Basis FunctionsMV = 30.290 - 2.439*(INDUS - 4)+ +
2.215*(INDUS-8) +
0
10
20
30
40
0 10 20 30INDUS
MV
Slope = 0 Slope = -2.439 Slope = -2.439+2.215=-0.224
Knot = 4 Knot = 8
Salford Systems ©2014 Introduction to Modern Regression 35
Adding Mirror Image BF• A standard basis function (X - knot)+ does not provide for a non-zero slope for
values below the knot• To handle this MARS uses a “mirror image” basis function, which looks at the
interval of a variable X which lies below the threshold cMV= 29.433 + 0.925*(4 - INDUS)+ -2.180*(INDUS-4)+ +1.939*(INDUS-8)+
0
10
20
30
40
0 5 10 15 20 25 30INDUS
MV
Slope = -0.925 Slope = -2.180Slope = -2.180+1.939=-0.241
Knot = 4 Knot = 8
Salford Systems ©2014 Introduction to Modern Regression 36
37
MARS Algorithm• Stands for Multivariate Adaptive Regression
Splines• Forward stage:
o Add pairs of BFs (direct and mirror pair of basis functions represents a single knot) in a step-wise regression manner
o The process stops once a user specified upper limit is reachedo Possible linear dependency is handled automatically by discarding
redundant BFs
• Backward stage:o Remove BFs one at a time in a step-wise regression mannero This creates a sequence of candidate models of varying complexity
• Selection stage:o Select optimal model based on the TEST performance (modern approach)o Select optimal model based on GCV criterion (legacy approach)
Salford Systems ©2014 Introduction to Modern Regression
38
MARS on Boston Data9-BF (7-variable)
Solution
• 414 records in the learn sample
• 92 records in the test sample
• 40% performance improvement on the test sampleo TEST MSE = 15.749
(OLS was 26.147)
Salford Systems ©2014 Introduction to Modern Regression
39
Non-linear Response Surface
• MARS automatically determined transition points between various local regions
• This model provides major insights into the nature of the relationship
Salford Systems ©2014 Introduction to Modern Regression
40
200 Replications
• All of the above models were repeated on 200 randomly selected 20% test partitions
• GPS shows marginal performance improvement but much smaller model
• MARS shows dramatic performance improvement
Regression
GPS
MARS
Salford Systems ©2014 Introduction to Modern Regression
Salford Systems ©2014 Introduction to Modern Regression41
Regression Trees• Regression trees result to piece-wise constant
models (multi-dimensional staircase) on an orthogonal partition of the data spaceo Thus usually not the best possible performer in terms of conventional
regression loss functions
• Only a very limited number of controls is available to influence the modeling processo Priors and costs are no longer applicableo There are two splitting rules: LS and LAD
• Very powerful in capturing high-order interactions but somewhat weak in explaining simple main effects
Split Improvement
• Parent node• Left child• Right child• Improvement• One can show• Find the split with the largest improvement by
conducting exhaustive search over all splits in the parent node
42 Salford Systems ©2014 Introduction to Modern Regression
Salford Systems ©2014 Introduction to Modern Regression43
Splitting the Root Node
• Improvement is defined in terms of the greatest reduction in the sum of squared errors when a single constant prediction is replaced by two separate constants on each side
Salford Systems ©2014 Introduction to Modern Regression44
Regression Tree Model
• All cases in the given node are assigned the same predicted response – the node average of the original target
• Nodes are color-coded according to the predicted response• We have a convenient segmentation of the population
according to the average response levels
Salford Systems ©2014 Introduction to Modern Regression45
The Best and the Worst Segments
• New approach to machine learning /function approximation developed by Jerome H. Friedman at Stanford Universityo Co-author of CART® with Breiman, Olshen and Stoneo Author of MARS®, PRIM, Projection Pursuit
• Also known as Treenet®• Good for classification and regression problems• Built on small trees and thus
o Fast and efficiento Data driveno Immune to outlierso Invariant to monotone transformations of variables
• Resistant to over training – generalizes very well• Can be remarkably accurate with little effort• BUT resulting model may be very complex
Stochastic Gradient Boosting
46 Salford Systems ©2014 Introduction to Modern Regression
The Algorithm• Begin with a very small tree as initial model
o Could be as small as ONE split generating 2 terminal nodeso Typical model will have 3-5 splits in a tree, generating 4-6 terminal nodeso Model is intentionally “weak”
• Compute “residuals” (prediction errors) for this simple model for every record in data
• Grow a second small tree to predict the residuals from the first tree
• Compute residuals from this new 2-tree model and grow a 3rd tree to predict revised residuals
• Repeat this process to grow a sequence of trees
+ + + …
Tree 1 Tree 2 Tree 3 More trees
47 Salford Systems ©2014 Introduction to Modern Regression
48
Illustration: Saddle Function
• 500 {X1,X2} points randomly drawn from a [-3,+3] box to produce the XOR response surface Y = X1 * X2
• Will use 3-node trees to show the evolution of Treenet response surface
Salford Systems ©2014 Introduction to Modern Regression
1 Tree 2 Trees 3 Trees 4 Trees 10 Trees
20 Trees 30 Trees 40 Trees 100 Trees 195 Trees
49
Notes on Treenet Solution• The solution evolves slowly and
usually includes hundreds or even thousands of small trees
• The process is myopic – only the next best tree given the current set of conditions is added
• There is a high degree of similarity and overlap among the resulting trees
• Very large tree sequences make the model scoring time and resource intensive
• Thus, ever present need to simplify (reduce) model complexity
Salford Systems ©2014 Introduction to Modern Regression
50
A Tree is a Variable Transformation
• Any tree in a Treenet model can be represented by a derived continuous variable as a function of inputs
Salford Systems ©2014 Introduction to Modern Regression
X2 <= -1.83
TerminalNode 1
Avg = -0.250N = 15
X2 > -1.83
TerminalNode 2
Avg = -0.062N = 461
X1 <= 1.47
Node 2Avg = -0.068
N = 476
X1 > 1.47
TerminalNode 3
Avg = -0.695N = 30
Node 1Avg = -0.105
N = 506
TREE_1 = F(X1,X2)
51
ISLE Compression of Treenet
• The original Treenet model combines all trees with equal coefficients
• ISLE accomplishes model compression by removing redundant trees and changing the relative contribution of the remaining trees by adjusting the coefficients
• Regularized Regression methodology provides the required machinery to accomplish this task!
Salford Systems ©2014 Introduction to Modern Regression
TN Model1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
ISLE Model1.3 0.0 0.4 0.0 0.0 1.8 1.2 0.3 0.0 0.0 0.0 0.1 0.0
52
A Node is a Variable
Transformation
• Any node in a Treenet model can be represented by a derived dummy variable as a function of inputs
Salford Systems ©2014 Introduction to Modern Regression
X2 <= -1.83
TerminalNode 1
Avg = -0.250N = 15
X2 > -1.83
TerminalNode 2
Avg = -0.062N = 461
X1 <= 1.47
Node 2Avg = -0.068
N = 476
X1 > 1.47
TerminalNode 3
Avg = -0.695N = 30
Node 1Avg = -0.105
N = 506
NODE_X = F(X1,X2)
53
RuleLearner Compression
• Create an exhaustive set of dummy variables for every node (internal and terminal) and every tree in a TN model
• Run GPS regularized regression to extract an informative subset of node dummies along with the modeling coefficients
• Thus, a model compression can be achieved by eliminating redundant nodes
• Each selected node dummy represents a specific rule-set which can be interpreted directly for further insights
Salford Systems ©2014 Introduction to Modern Regression
TN Model T1_N1 + T1_T1 + T1_T2 + T1_T3 + T2_N1 + T2_T1 + T2_T2 + T2_T3 + T3_N1 + T3_T1 + …
Coefficient 1Rule-set 1
Coefficient 2Rule-set 2
Coefficient 3Rule-set 3
Coefficient 4Rule-set 4
GPS Regression
54
Salford Predictive Modeler SPM
• Download a current version from our website http://www.salford-systems.com
• Version will run without a license key for 10-days
• Request a license key [email protected]
• Request configuration to meet your needso Data handling capacityo Data mining engines made available
Salford Systems ©2014 Introduction to Modern Regression