![Page 1: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/1.jpg)
Sparse Modeling Theory, Algorithms and Applications
Irina RishComputational Biology Center (CBC)IBM T.J. Watson Research Center, NY
Genady
GrabarnikDepartment of Math and CSCUNY, NY
![Page 2: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/2.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 3: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/3.jpg)
9:00-9:40 IntroductionLasso
9:40-10:20Sparse signal recovery and Lasso: Some Theory
10:20-10:30Coffee Break
10:30-11:45 Sparse Modeling Beyond Lasso
Schedule
![Page 4: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/4.jpg)
Decoding:Inference, learning
)|(maxarg* yxxx
P=
Can we recover a high-dimensional X from a low-dimensional Y?
A Common Problem
X1 X2 X3 X4Unknown
state of world X5
(Noisy) encodingY=f(X)
X2
encoding preserves information about X
Examples: Sparse signal recovery (compressed sensing, rare-event diagnosis)
Sparse model learning
Yes, if:X is structured; e.g., sparse (few Xi = 0 ) or compressible (few large Xi) /
Y1 Y2Observations
![Page 5: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/5.jpg)
X1 X2 X3 X4 X5 X6
Y1
Y2
Y3
Y4
vector Y of end-to-end
probe delays
Y
vector X of (unknown) link delays
M p
robe
s
Routing matrix A
1 0 1 1 0 0
0 1 1 0 1 0
1 1 0 0 0 0
1 0 0 1 0 1
Model: y = Ax + noise
N links (possible bottlenecks)
916
2
3
4
Probe 4216
Example 1: Diagnosis in Computer Networks
(Beygelzimer, Kephart and Rish 2007)
Recover sparse state (`signal’) X from noisy linear observations
Problem structure: X is nearly sparse - small number of large delays
Task: find bottlenecks (extremely slow links) using probes (M << N)
![Page 6: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/6.jpg)
Data: high-dimensional, small-sample10,000 - 100,000 variables (voxels)
100s of samples (time points, or TRs)
Task: given fMRI, predict mental statesemotional: angry, happy, anxious, etc.
cognitive: reading a sentence vs viewing an image
mental disorders (schizophrenia, autism, etc.)
fMRI image courtesy of fMRI Research Center @ Columbia University
Example 2: Sparse Model Learning from fMRI
Data
Issues:
Overfitting: can we learn a predictive model that generalizes well?
Interpretability: can we identify brain areas predictive of mental states?
![Page 7: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/7.jpg)
Small number of Predictive Variables ?
Sparse Statistical Models: Prediction + Interpretability
+++
+- -
--
-
Data Predictive Model y = f(x)
x - fMRI voxels,
y - mental state
sad
happy
Sparsity variable selection model interpretability
Sparsity regularization less overfitting / better prediction
![Page 8: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/8.jpg)
Sparse Linear Regression
y
=
Ax
+ noise
fMRI
data (“encoding’)rows –
samples (~500)Columns –
voxels
(~30,000)
Unknownparameters(‘signal’)
Measurements:mental states, behavior,tasks or stimuli
fMRI activation image and time-course courtesy of Steve Smith, FMRIB
Find small number of most relevant voxels
(brain areas)
![Page 9: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/9.jpg)
design(measurement)
matrix
Sparse Recovery in a Nutshell
Can we
recover a sparse input efficientlyfrom a
small number of measurements?
noiselessobservations sparse input
![Page 10: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/10.jpg)
N dimensions
K nonzeros
M
from just measurements
efficiently - by solving convex problem ( linear program)
``Compressed Sensing Surprise’’:
Given
random A
(i.i.d. Gaussian entries), can be reconstructed exactly (with high probability):
Sparse Recovery in a Nutshell
![Page 11: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/11.jpg)
M
In general, if
A is ``good’’
(e.g., satisfies
Restricted Isometry
Property with a proper constant), sparse can be reconstructed with M <<Nmeasurements by solving (linear program):
N dimensions
K nonzeros
Sparse Recovery in a Nutshell
![Page 12: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/12.jpg)
sparse input
design(measurement)
matrix
noise
noiselessobservations
observations
And what if there is noise in observations?
Sparse Recovery in a Nutshell
![Page 13: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/13.jpg)
Still, can reconstruct the input accurately
(in l2-sense), for A satisfying RIP; just solve a noisy version
of our l1-optimization:
(Basis Pursuit, aka Lasso)
Sparse Recovery in a Nutshell
![Page 14: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/14.jpg)
Sparse Linear Regression vs
Sparse Signal Recovery
Both solve the same optimization problem
Both share efficient algorithms and theoretical results
However, sparse learning setting is more challenging:
We do not design the “design” matrix, but rather deal with the given data
Thus, nice matrix properties may not be satisfied (and they are hard to test on a given matrix, anyway)
We don’t really know the ground truth (``signal”) – but rather assume it is sparse (to interpret and to regularize)
Sparse learning includes a wide range of problems beyond sparse linear regression (part 2 of this tutorial)
![Page 15: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/15.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 16: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/16.jpg)
![Page 17: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/17.jpg)
Motivation: Variable Selection
![Page 18: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/18.jpg)
Model Selection as Regularized Optimization
![Page 19: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/19.jpg)
Bayesian Interpretation: MAP Estimation
![Page 20: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/20.jpg)
Log-likelihood Losses: Examples
![Page 21: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/21.jpg)
![Page 22: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/22.jpg)
![Page 23: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/23.jpg)
lq-norm constraints for different values of q
Image courtesy of [Hastie, Friedman and Tibshirani, 2009]
What is special about l1-norm? Sparsity
+ Computational Efficiency
![Page 24: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/24.jpg)
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
lambda = 2lambda=1lambda=0.5
![Page 25: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/25.jpg)
.
![Page 26: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/26.jpg)
![Page 27: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/27.jpg)
Lasso vs
Ridge and Best-Subset in Case of Orthonormal
Designs
Image courtesy of [Hastie, Friedman and Tibshirani, 2009]
hard thresholding shrinkage soft thresholding
![Page 28: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/28.jpg)
![Page 29: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/29.jpg)
![Page 30: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/30.jpg)
Geometric View of LARS
Image courtesy of [Hastie, 2007]
![Page 31: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/31.jpg)
Piecewise Linear Solution Path: LARS vs
LASSOLARS vs
LASSO for pain perception prediction from fMRI
data [Rish, Cecchi, Baliki, Apkarian, 2010]:for illustration purposes, we use just n=9 (out of 120) samples, but p=4000 variables; LARS selects n-1=8 variables
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
Least Angle regression (LARS)
Coe
ffici
ents
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
LARS with LASSO modification
Coe
ffici
ents
Variable deleted
Crossingzero
![Page 32: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/32.jpg)
![Page 33: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/33.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 34: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/34.jpg)
![Page 35: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/35.jpg)
![Page 36: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/36.jpg)
![Page 37: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/37.jpg)
![Page 38: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/38.jpg)
![Page 39: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/39.jpg)
![Page 40: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/40.jpg)
![Page 41: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/41.jpg)
![Page 42: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/42.jpg)
![Page 43: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/43.jpg)
![Page 44: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/44.jpg)
![Page 45: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/45.jpg)
![Page 46: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/46.jpg)
![Page 47: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/47.jpg)
![Page 48: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/48.jpg)
![Page 49: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/49.jpg)
![Page 50: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/50.jpg)
![Page 51: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/51.jpg)
![Page 52: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/52.jpg)
![Page 53: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/53.jpg)
![Page 54: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/54.jpg)
![Page 55: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/55.jpg)
![Page 56: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/56.jpg)
![Page 57: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/57.jpg)
![Page 58: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/58.jpg)
![Page 59: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/59.jpg)
![Page 60: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/60.jpg)
![Page 61: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/61.jpg)
![Page 62: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/62.jpg)
![Page 63: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/63.jpg)
•
Introduction
•
Sparse Linear Regression: Lasso
•
Sparse Signal Recovery and Lasso: Some Theory
•
Sparse Modeling: Beyond Lasso
•
Consistency-improving extensions
•
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
•
Beyond linear model (GLMs, MRFs)
•
Sparse Matrix Factorizations
•
Beyond variable-selection: variable construction
•
Summary and Open Issues
Outline
![Page 64: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/64.jpg)
![Page 65: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/65.jpg)
![Page 66: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/66.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 67: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/67.jpg)
Beyond LASSO
Elastic Net Fused LassoBlock l1-lq norms:
group Lassosimultaneous Lasso
Other likelihoods (loss functions)
Adding structurebeyond sparsity
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
![Page 68: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/68.jpg)
LASSOTruth
relevant
cluster of correlated predictors
![Page 69: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/69.jpg)
ridge penalty
λ1 = 0, λ2
> 0
lasso penalty
λ1 > 0, λ2
= 0
Elastic Net
penalty
![Page 70: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/70.jpg)
![Page 71: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/71.jpg)
subjects playing a videogame in a scanner
17 minutes
Example: Application to fMRI
Analysis
Pittsburgh Brain Activity Interpretation Competition (PBAIC-07):
24 continuous response variables, e.g.•
Annoyance•
Sadness•
Anxiety•
Dog•
Faces•
Instructions•
Correct hits
Goal: predict responses from fMRI
data
![Page 72: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/72.jpg)
Higher λ2
→ selection of more voxels from correlated clusters →larger, more spatially coherent clusters
Small grouping effect:
λ2
= 0.1 Larger grouping effect:
λ2
= 2.0
Grouping Effect on PBAIC data
Predicting ‘Instructions’
(auditory stimulus)
(Carroll, Cecchi, Rish, Garg, Rao
2009)
![Page 73: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/73.jpg)
Instructions VRFixation Velocity0
0.2
0.4
0.6
0.8
1
Regression Method
Tes
t C
orr
el.
OLSRidgeLASSOEN 0.1EN 2.0
Among almost equally predictive models,
Grouping Tends to Improve Model Stability
Stability is measured here by average % overlap between models for 2 runs by same subject
(Carroll, Cecchi, Rish, Garg, Rao
2009)
increasing λ2
can significantly improve model stability
![Page 74: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/74.jpg)
0 500 1000 15000.55
0.6
0.65
0.7
0.75
0.8
number of voxels (sparsity)
pred
ictiv
e ac
cura
cy (
corr
w/ r
espo
nse)
Pain Prediction
OLSlambda2=0.1lambda2=1lambda2=5lambda2=10
Another Application: Sparse Models of Pain Perception from fMRI
Including more correlated voxels
(increasing λ2
) often improves the prediction accuracy as well
Predicting pain ratings from fMRI
in presence of thermal pain stimulus (Rish, Cecchi, Baliki, Apkarian, BI-2010)
number of voxels
Pre
dict
ive
accu
racy
(cor
r. w
ith re
spon
se)
Best prediction
for higher λ2
![Page 75: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/75.jpg)
Image courtesy of [Tibshirani et al, 2005]
![Page 76: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/76.jpg)
![Page 77: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/77.jpg)
Group Lasso: Examples
![Page 78: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/78.jpg)
More on Group Lasso
![Page 79: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/79.jpg)
![Page 80: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/80.jpg)
![Page 81: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/81.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 82: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/82.jpg)
Beyond Lasso: General Log-likelihood Losses
1. Gaussian Lasso
2. Bernoulli logistic regression
4. Multivariate Gaussian Gaussian MRFs
3. Exponential-family Generalized Linear Models(includes 1 and 2)
l1-regularized M-estimators
![Page 83: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/83.jpg)
Beyond LASSO
Elastic Net Fused LassoBlock l1-lq norms:
group Lassosimultaneous Lasso
Other likelihoods (loss functions)
Adding structurebeyond sparsity
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
![Page 84: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/84.jpg)
Sparse Signal Recovery
with M-estimators
risk consistency of generalized linear models (Van de Geer, 2008)
model-selection consistency of Gaussian MRFs (Ravikumar et al, 2008a)
generalized linear models: recovery in l2-norm (non-asymptotic regime) for exponential-family noise and standard RIP conditions on the design matrix (Rish and Grabarnik, 2009)
Asymptotic consistency of general losses satisfying restricted strong convexity, with decomposable regularizers (Negahban et al., 2009)
Yes!(under proper conditions)
. Can l1-regularization accurately recover sparse signals given general log P(y|x) losses?
![Page 85: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/85.jpg)
Exponential Family Distributions
naturalparameter
log-partition function
base measure
Examples: Gaussian, exponential, Bernoulli, multinomial, gamma, chi-square, beta, Weibull, Dirichlet, Poisson, etc.
![Page 86: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/86.jpg)
Generalized Linear Models (GLMs)
![Page 87: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/87.jpg)
Summary: Exponential Family, GLMs, and Bregman
Divergences
Exponential-Family Distributions
Generalized Linear Models Bregman
Divergences
Bijection
Theorem (Banerjee
et al, 2005):
Fitting GLM maximizing exp-family likelihoodminimizing Bregman divergence
Legendre duality:
![Page 88: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/88.jpg)
![Page 89: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/89.jpg)
sparse signal
design matrixnatural parameters
noise
observations
Sparse Signal Recovery with Exponential-Family Noise
Can we recover a sparse signal from a small number of noisy observations?
![Page 90: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/90.jpg)
Sufficient Conditions
Noise is small:1
Restricted Isometry
Property
(RIP)
2s-sparse
3
*otherwise, different proofs for some specific cases (e.g., Bernoulli, exponential, etc. )
bounded
4*
![Page 91: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/91.jpg)
![Page 92: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/92.jpg)
Beyond LASSO
Elastic Net Fused LassoBlock l1-lq norms:
group Lassosimultaneous Lasso
Other likelihoods (loss functions)
Adding structurebeyond sparsity
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
![Page 93: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/93.jpg)
Markov Networks (Markov Random Fields)
![Page 94: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/94.jpg)
Social NetworksUS senate voting data (Banerjee et al, 2008): democrats (blue)
and republicans (red)
Sparse Markov Networks in Practical Applications
Genetic NetworksRosetta Inpharmatics Compendium of gene expression profiles (Banerjee et al, 2008)
Brain Networks from fMRIMonetary reward task (Honorio et al., 2009)Drug addicts more connections in cerebellum (yellow) vs control subjects (more connections in prefrontal cortex – green)
(a) Drug addicts (b) controls
![Page 95: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/95.jpg)
Sparse MRFs
Can Predict Well
*Data @ www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-81/www/
from T. Mitchell et al., Learning to Decode Cognitive States from Brain Images,Machine Learning, 2004.
Classifying Schizophrenia(Cecchi
et al., 2009)
86% accuracy
0 50 100 150 200 250 300 350 4000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
K top voxels (ttest)cl
assi
ficat
ion
erro
r
Sentence vs. Picture: Subject 04820
sparse MRF (1.0)error
Mental state prediction (sentence vs
picture)*:(Scheinberg
and Rish, submitted)
90%
accuracy
MRF classifiers can often exploit informative interactions among variables and often outperform state-of-art linear classifiers
(e.g., SVM)
![Page 96: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/96.jpg)
FDR-corrected Degree Maps
Network Properties as BioMarkers
(Predictive Features)
2-sample t-test performed for each voxel
in degree maps, followed by FDR correction
Red/yellow: Normal subjects have higher
values than Schizophrenics
-
Voxel
degrees in functional networks (thresholded
covariance
matrices) are statistically significantly different in schizophrenic patients that appear to lack “hubs”
in auditory/language areas
Discriminative Network Models of Schizophrenia (Cecchi
et al., 2009)
Also, abnormal MRF connectivity observed in Alzheimer’s patients (Huang 2009), in drug addicts (Honorio
2009), etc.
![Page 97: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/97.jpg)
Sparse Inverse Covariance Selection Problem
![Page 98: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/98.jpg)
Maximum Likelihood Estimation
![Page 99: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/99.jpg)
![Page 100: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/100.jpg)
Block-Coordinate Descent on the Dual Problem
![Page 101: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/101.jpg)
Projected Gradient on the Dual Problem
![Page 102: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/102.jpg)
Alternatives: Solving Primal Problem Directly
1.
2.
![Page 103: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/103.jpg)
Additional Related Work
![Page 104: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/104.jpg)
Bayesian Approach (N.Bani Asadi, K. Scheinberg and I. Rish, 2009)Assume a Bayesian prior on the regularization parameter
Find maximum a posteriory probability (MAP) solution
Selecting the Proper Regularization Parameter
“…the general issue of selecting a proper amount of regularization
for getting a right-sized structure or model has largely remained a problem with unsatisfactory solutions“
(Meinshausen
and Buehlmann
, 2008)
“asymptotic considerations give little advice on how to choose a specific penalty parameter for a given problem'‘
(Meinshausen
and Buehlmann
, 2006)
Result:more ``balanced’’ solution (False Positive vs False Negative error) than
cross-validation - too dense, and
theoretical (Meinshausen & Buehlmann 2006, Banerjee et al 2008) - too sparse
Does not require solving multiple optimization problems over data subsets as compared to the stability selection approach (Meinshausen and Buehlmann 2008)
![Page 105: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/105.jpg)
Existing Approaches
![Page 106: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/106.jpg)
![Page 107: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/107.jpg)
Results on Random Networks
0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N
Fal
se N
egat
ive
Err
or
Random Networks, P=100, density=4%
Flat Prior (Reg. Likelihood)Exp. PriorTheoreticalCV
0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NF
alse
Pos
itive
Err
or
Random Networks, P=100, density=4%
Flat Prior (Reg. Likelihood)Exp. PriorTheoreticalCV
- Cross-validation (green) overfits drastically, producing almost complete C matrix
False Negatives: Missed Links False Positives: ‘Noisy’
Links
- Theoretical (black) is too conservative: misses too many edges (near-diagonal C)
- Prior-based approaches (red and blue) are much more ‘balanced’: low FP and FN
![Page 108: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/108.jpg)
![Page 109: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/109.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 110: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/110.jpg)
Sparse Matrix Factorization
X
n sa
mpl
es
p variables
~
U V
m b
asis
vec
tors
(dic
tiona
ry)
T
sparse representation
![Page 111: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/111.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 112: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/112.jpg)
Supervised Dimensionality Reduction (SDR):
U1
X1 XD
YK
UL
Y1
…
…
…
From Variable Selection to Variable Construction
Learn a predictor (mapping from U to Y) simultaneously with dimensionality reduction
Assume there is an inherent low-dimensionalstructure in the data that is predictive about the target Y
Idea: dimensionality reduction (DR) guided by the class label may result into better predictive features than the unsupervised DR
![Page 113: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/113.jpg)
Particular Mappings X U and U Y
1. F. Pereira and G. Gordon. The Support Vector Decomposition Machine,
ICML-06.Real-valued X, discrete Y (linear map from X to U, SVM for Y(U) )
2. E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side information, NIPS-02.
3. K. Weinberger, J. Blitzer and L. Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification, NIPS-05.
Real-valued X, discrete Y (linear map from X to U, nearest-neighbor Y(U))
4. K. Weinberger and G. Tesauro. Metric Learning for Kernel Regression, AISTATS-07.Real-valued X, real-valued Y (linear map from X to U, kernel regression Y(U))
5. Sajama and A. Orlitsky. Supervised Dimensionality Reduction using Mixture Models, ICML-05.Multi-type X (exp.family), discrete Y (modeled as mixture of exp-family distributions)
6. M. Collins, S. Dasgupta
and R. Schapire. A generalization of PCA to the exponential family, NIPS-01.7. A. Schein, L. Saul and L. Ungar. A generalized linear model for PCA of binary data, AISTATS-03
Unsupervised dimensionality reduction beyond Gaussian data (nonlinear GLM mappings)
8. I. Rish, G. Grabarnik, G. Cecchi, F. Pereira and G. Gordon. Closed-form Supervised Dimensionality Reduction with Generalized Linear Models, ICML-08
![Page 114: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/114.jpg)
Y1
U
YK…X1 XD…
V1VD
WkW1
Example: SDR with Generalized Linear Models (Rish
et al., 2008)
E.g., in linear case, we have:
X ~ U V and Y ~ U V
![Page 115: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/115.jpg)
SDR outperforms unsupervised DR by 20-45%
Using proper data model (e.g., Bernoulli-SDR for binary data) matters
SDR ``gets’’ the structure (0% error), SVM does not (20% error)
Supervised DR Outperforms Unsupervised DR on Simulated Data
Generate a separable 2-D dataset U
Blow-up in D dimensional data X by adding exponential-family noise (e.g., Bernoulli)
Compare SDR w/ different noise models (Gaussian, Bernoulli) vs. unsupervised DR (UDR) followed by SVM or logistic regression
![Page 116: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/116.jpg)
Real-valued data, Classification Task Predict the type of word (tools or buildings) the subject is seeing84 samples (words presented to a subject), 14043 dimensions (voxels)
Latent dimensionality L = 5, 10, 15, 20, 25
Gaussian-SDR achieves overall best performance
SDR matches SVM’s performance using only 5 dimensions, while SVDM needs 15
SDR greatly outperforms unsupervised DR followed by learning a classifier
…and on Real-Life Data from fMRI
Experiments
![Page 117: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/117.jpg)
Introduction
Sparse Linear Regression: Lasso
Sparse Signal Recovery and Lasso: Some Theory
Sparse Modeling: Beyond Lasso
Consistency-improving extensions
Beyond l1-regularization (l1/lq, Elastic Net, fused Lasso)
Beyond linear model (GLMs, MRFs)
Sparse Matrix Factorizations
Beyond variable-selection: variable construction
Summary and Open Issues
Outline
![Page 118: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/118.jpg)
Summary and Open Issues
Common problem: small-sample, high-dimensional inference
Feasible if the input is structured – e.g. sparse in some basis
Efficient recovery of sparse input via l1- relaxation
Sparse modeling with l1-regularization: interpretability + prediction
Beyond l1-regularization: adding more structure
Beyond Lasso: M-estimators, dictionary learning, variable construction
Open issues, still:
choice of regularization parameter?
choice of proper dictionary?
Is interpretability sparsity? (NO!)
![Page 119: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/119.jpg)
InterpretablePredictivePatterns
Interpretability: Much More than Sparsity?
+++
+- -
--
-
Data Predictive Model y = f(x)
x - fMRI voxels,
y - mental state
sad
happy
![Page 120: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/120.jpg)
References
![Page 121: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/121.jpg)
References
![Page 122: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/122.jpg)
References
![Page 123: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/123.jpg)
References
![Page 124: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/124.jpg)
References
![Page 125: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/125.jpg)
References
![Page 126: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/126.jpg)
References
![Page 127: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/127.jpg)
References
![Page 128: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/128.jpg)
Appendix A
![Page 129: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/129.jpg)
Why Exponential Family Loss?
Network Management – Problem Diagnosis: binary failures - Bernoullinon-negative delays – exponential
W ebServer
Hub
DBServer
Router
Probing station
Variety of data types: real-valued, binary, nominal, non-negative, etc.
Noise model: exponential-family
Collaborative prediction: discrete rankings - multinomial
DNA microarray data analysis:Real-valued expression level – Gaussian
fMRI data analysisReal-valued voxel intensities, binary, nominal and continuous responses
![Page 130: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/130.jpg)
![Page 131: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/131.jpg)
Legendre duality:
Image courtesy of Arindam
Banerjee
Exponential-family distribution
Bregmandivergence
![Page 132: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/132.jpg)
Appendix A
![Page 133: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/133.jpg)
Appendix B
![Page 134: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/134.jpg)
*
*
![Page 135: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/135.jpg)
Appendix B
![Page 136: 2012 10 04_machine_learning_lecture05_icml10_tutorial](https://reader033.vdocuments.us/reader033/viewer/2022052410/54b41b574a7959f1128b456e/html5/thumbnails/136.jpg)
Beyond LASSO
LASSO
Elastic Net penaltyFused Lasso penaltyBlock l1-lq norms: group & multi-task penalties
Improving consistency and stability w.r.t. the sparsity
parameter choice
Adaptive LassoRelaxed LassoBootstrap LassoRandomized Lasso w/ Stability selection
Other losses(other data likelihoods)
Other penalties(structured sparsity)
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)
Generalized Linear Models (exponential family noise)
Multivariate Gaussians (Gaussian MRFs)