when can machine learning be useful for communication … · goals and learning outcomes goals: i...
TRANSCRIPT
When Can Machine Learning Be Useful forCommunication Systems?
Osvaldo Simeone
King’s College London
Osvaldo Simeone ML for Comm 1 / 184
Goals and Learning Outcomes
Goals:I Provide an introduction to main areas in machine learningI Offer pointers to specific applications for telecom
Learning outcomes:I Recognize scenarios in which machine learning can and cannot be usefulI Identify specific classes of machine learning methods that apply to a
given problem
Osvaldo Simeone ML for Comm 2 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 3 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 4 / 184
What is Machine Learning?Traditional engineering (model-based) approach:
I Acquisition of domain knowledge...
Osvaldo Simeone ML for Comm 5 / 184
What is Machine Learning?Traditional engineering (model-based) approach:
I ... mathematical (physics-based) modelling...
Osvaldo Simeone ML for Comm 6 / 184
What is Machine Learning?Traditional engineering (model-based) approach:
I ... and optimized algorithm design with performance guarantees (underthe given model)
Osvaldo Simeone ML for Comm 7 / 184
What is Machine Learning?
Engineering design flow
acquisition of domain knowledge
algorithm development
physics-based mathematical model
algorithm with performance guarantees
acquisition of data
learning
training set
black-boxmachine
hypothesis class
(b)Osvaldo Simeone ML for Comm 8 / 184
What is Machine Learning?Machine learning approach:
I Selection of a general-purpose model (hypothesis class)
Osvaldo Simeone ML for Comm 9 / 184
What is Machine Learning?Machine learning approach:
I Selection of a general-purpose model (hypothesis class), objectivefunction, and a learning algorithm...
Osvaldo Simeone ML for Comm 10 / 184
What is Machine Learning?Machine learning approach:
I ... learning based on data (examples)
Osvaldo Simeone ML for Comm 11 / 184
What is Machine Learning?Machine learning approach:
I ... and use of the trained (black-box) machine for inference
Osvaldo Simeone ML for Comm 12 / 184
What is Machine Learning?
Machine learning or data-driven approach
acquisition of domain knowledge
algorithm development
physics-based mathematical model
algorithm with performance guarantees
acquisition of data
learning
training set
black-boxmachine
hypothesis class
(a)
Osvaldo Simeone ML for Comm 13 / 184
What is Machine Learning?
Integrating domain knowledge into a machine learning approachI Choose hypothesis class and learning algorithm based on knowledge of
the problem
acquisition of domain knowledge
acquisition of data
learning
training set
black-boxmachine
hypothesis class
Osvaldo Simeone ML for Comm 14 / 184
Taxonomy of Machine Learning Methods
Supervised learning
Unsupervised learning
Reinforcement learning
Osvaldo Simeone ML for Comm 15 / 184
Taxonomy of Machine Learning Methods
Supervised vs unsupervised learning
Osvaldo Simeone ML for Comm 16 / 184
Taxonomy of Machine Learning Methods
Reinforcement learning: feedback-based sequential decision making
[@ D. Silver]
st at
rt
Osvaldo Simeone ML for Comm 17 / 184
When to Use Machine Learning?
Potential advantages:I lower cost and faster development
F if collecting data is less expensive/ quicker than acquiring aphysics-based model or an optimal algorithm
I reduced implementation complexityF if hypothesis class contains low-complexity machines
Potential disadvantagesI suboptimal performance and limited performance guaranteesI limited interpretability
Osvaldo Simeone ML for Comm 18 / 184
When to Use Machine Learning?
Potential advantages:I lower cost and faster development
F if collecting data is less expensive/ quicker than acquiring aphysics-based model or an optimal algorithm
I reduced implementation complexityF if hypothesis class contains low-complexity machines
Potential disadvantagesI suboptimal performance and limited performance guaranteesI limited interpretability
Osvaldo Simeone ML for Comm 18 / 184
When to Use Machine Learning?
Potential advantages:I lower cost and faster development
F if collecting data is less expensive/ quicker than acquiring aphysics-based model or an optimal algorithm
I reduced implementation complexityF if hypothesis class contains low-complexity machines
Potential disadvantagesI suboptimal performance and limited performance guaranteesI limited interpretability
Osvaldo Simeone ML for Comm 18 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate:
F model deficit: a physics-based model is not availableF algorithm deficit: a model is available, but optimal or near-optimal
algorithms are not known or too complex to implement
Osvaldo Simeone ML for Comm 19 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit
I large data sets exist or can be created
I the task requires does not require detailed explanations for how thedecision was made
I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model
deficit
I the phenomenon or function being learned should not change rapidly
Osvaldo Simeone ML for Comm 20 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit
I large data sets exist or can be created
I the task requires does not require detailed explanations for how thedecision was made
I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model
deficit
I the phenomenon or function being learned should not change rapidly
Osvaldo Simeone ML for Comm 20 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit
I large data sets exist or can be created
I the task requires does not require detailed explanations for how thedecision was made
I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model
deficit
I the phenomenon or function being learned should not change rapidly
Osvaldo Simeone ML for Comm 20 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit
I large data sets exist or can be created
I the task requires does not require detailed explanations for how thedecision was made
I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model
deficit
I the phenomenon or function being learned should not change rapidly
Osvaldo Simeone ML for Comm 20 / 184
When to Use Machine Learning?
Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit
I large data sets exist or can be created
I the task requires does not require detailed explanations for how thedecision was made
I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model
deficit
I the phenomenon or function being learned should not change rapidly
Osvaldo Simeone ML for Comm 20 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 21 / 184
Data in Communication Networks
Fog network architecture [5GPPP]
Osvaldo Simeone ML for Comm 22 / 184
Data in Communication Networks
Fog network architecture [5GPPP]
Data collection and processing can take place at the edge and/or atthe cloud.
Osvaldo Simeone ML for Comm 23 / 184
Data in Communication Networks
Data at the edge:I PHY: Baseband signals, (multi-RAT) channel qualityI MAC/ Link: Throughput, FER, random access load and latencyI Network: Location, traffic loads across services, users’ device types,
battery levelsI Application: Users’ preferences, content demands, computing loads,
QoS metrics
Osvaldo Simeone ML for Comm 24 / 184
Data in Communication Networks
Data at the cloud:I Network: Mobility patterns, network-wide traffic statistics, outage ratesI Application: User’s behavior patterns, subscription information, service
usage statistics, TCP/IP traffic statistics
Osvaldo Simeone ML for Comm 25 / 184
Machine Learning in Communication Networks
Which tasks?I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit (depends)I large data sets exist or can be created XI the task requires does not require detailed explanations for how the
decision was made XI the task does not have stringent optimality requirements (depends)I the phenomenon or function being learned should not change rapidly
(depends)
Osvaldo Simeone ML for Comm 26 / 184
Machine Learning in Communication Networks
Osvaldo Simeone ML for Comm 27 / 184
Machine Learning in Communication Networks3GPP has introduced Network Data Analytics to enable access todata from the network functions and services in the core networkETSI has proposed use cases for Experiential Network Intelligence(ENI) based on such data
[China Mobile] [ETSI]Osvaldo Simeone ML for Comm 28 / 184
Machine Learning for Communication vs Communicationfor Machine Learning
This presentation will discuss machine learning for communicationCommunication plays a key role in distributed learning systems,including edge and federated learning
[Park et al '19]
Osvaldo Simeone ML for Comm 29 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 30 / 184
Supervised Learning
Supervised learning:I regression: continuous labelsI classification: discrete labels
Osvaldo Simeone ML for Comm 31 / 184
Supervised Learning: Regression
0 0.2 0.4 0.6 0.8 1-1.5
-1
-0.5
0
0.5
1
1.5
?
Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (continuous)Goal: Predict the label t for a new, that is, as of yet unobserved,domain point x
Osvaldo Simeone ML for Comm 32 / 184
Supervised Learning: Classification
4 5 6 7 8 90.5
1
1.5
2
2.5
3
3.5
4
4.5
?
Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (discrete)Goal: Predict the label (class) t for a new, that is, as of yetunobserved, domain point x
Osvaldo Simeone ML for Comm 33 / 184
Supervised Learning
Impossible task without assuming a model (inductive bias) by the nofree lunch theorem
Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x
Osvaldo Simeone ML for Comm 34 / 184
Supervised Learning
Impossible task without assuming a model (inductive bias) by the nofree lunch theorem
Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x
Osvaldo Simeone ML for Comm 34 / 184
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone ML for Comm 35 / 184
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone ML for Comm 35 / 184
Defining Supervised Learning
Training set D:
(xn, tn) ∼i.i.d.
p(x , t), n = 1, ...,N
Based on the training set D, we derive a predictor t(x).
Test pair:(x, t) ∼
indep. of Dp(x , t)
Quality of the prediction t(x) for a pair (x , t)
`(t, t(x))
for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)
Osvaldo Simeone ML for Comm 35 / 184
Defining Supervised Learning
Goal: minimize average loss on the test pair (generalization loss)
Lp(t) = E(x,t)∼pxt [`(t, t(x))]
Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)
Osvaldo Simeone ML for Comm 36 / 184
Defining Supervised Learning
Goal: minimize average loss on the test pair (generalization loss)
Lp(t) = E(x,t)∼pxt [`(t, t(x))]
Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)
Osvaldo Simeone ML for Comm 36 / 184
When the True Distribution p(x , t) is Known...
... we don’t need data D
... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).
The solution can be directly computed from the posterior orpredictive distribution
p(t|x) =p(x , t)
p(x)
as
t∗(x) = argmint
Et∼pt|x [`(t, t)|x ]
Osvaldo Simeone ML for Comm 37 / 184
When the True Distribution p(x , t) is Known...
... we don’t need data D
... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).
The solution can be directly computed from the posterior orpredictive distribution
p(t|x) =p(x , t)
p(x)
as
t∗(x) = argmint
Et∼pt|x [`(t, t)|x ]
Osvaldo Simeone ML for Comm 37 / 184
When the Model p(x , t) is Known...
With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]
With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)
Example: with joint distribution
x\t 0 1
0 0.05 0.45
1 0.4 0.1
, we have
p(t = 1|x = 0) = 0.9
and
t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,
t∗(x = 0) = 1 for probability of error (MAP)
.
Osvaldo Simeone ML for Comm 38 / 184
When the Model p(x , t) is Known...
With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]
With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)
Example: with joint distribution
x\t 0 1
0 0.05 0.45
1 0.4 0.1
, we have
p(t = 1|x = 0) = 0.9
and
t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,
t∗(x = 0) = 1 for probability of error (MAP)
.
Osvaldo Simeone ML for Comm 38 / 184
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 39 / 184
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 39 / 184
When the True Distribution p(x , t) is Not Known...
... we need data D
... and we have a learning problem
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 39 / 184
Logistic Regression
Example: Binary classification (t ∈ {0, 1})1. Model selection (inductive bias): logistic regression(discriminative model)
φ(x) = [φ1(x) · · ·φD′(x)]T is a vector of features (e.g., bag-of-wordsmodel for a text).
Osvaldo Simeone ML for Comm 40 / 184
Logistic Regression
Parametric probabilistic model:
p(t = 1|x ,w) = σ(wTφ(x))
where σ(a) = (1 + exp(−a))−1 is the sigmoid function.
Osvaldo Simeone ML for Comm 41 / 184
Logistic Regression2. Learning: To be discussed3. Inference: With probability of error loss, MAP classification
wTφ(x)︸ ︷︷ ︸logit or LLR
t=1≷t=0
0
Osvaldo Simeone ML for Comm 42 / 184
Multi-Layer Neural Networks
1. Model selection (inductive bias): multi-layer neural network(discriminative model)
Multiple layers of learnable weights enable feature learning.
Osvaldo Simeone ML for Comm 43 / 184
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 44 / 184
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 45 / 184
Learning: Maximum Likelihood
ML selects a value of θ that is the most likely to have generated theobserved training set D:
maximize p(D|θ)
⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)
⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)
Also known as log-loss, or cross-entropy for discriminative models
Requires sum over training points, e.g., for discriminative models:
minimize − ln p(tD|xD, θ) = −N∑
n=1
ln p(tn|xn, θ)
Osvaldo Simeone ML for Comm 46 / 184
Learning: Maximum Likelihood
ML selects a value of θ that is the most likely to have generated theobserved training set D:
maximize p(D|θ)
⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)
⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)
Also known as log-loss, or cross-entropy for discriminative models
Requires sum over training points, e.g., for discriminative models:
minimize − ln p(tD|xD, θ) = −N∑
n=1
ln p(tn|xn, θ)
Osvaldo Simeone ML for Comm 46 / 184
Learning: Maximum LikelihoodThe problem rarely has analytical solutions and is typically addressedby Stochastic Gradient Descent (SGD).For discriminative models, we have
θnew ← θold + γ∇θ ln p(tn|xn, θ)|θ=θold
γ is the learning rate.With multi-layer neural networks, this approach yields thebackpropagation algorithm.
Osvaldo Simeone ML for Comm 47 / 184
Learning: Maximum Likelihood
Maximum likelihood is only concerned with the performance on thetraining set.
To improve generalization (performance outside the training set),regularization is typically used.
Regularization can be generally interpreted as imposing some priorknowledge on the model parameters θ.
Osvaldo Simeone ML for Comm 48 / 184
Supervised Learning
1. Model selection (inductive bias): Define a parametric model
p(x , t|θ)︸ ︷︷ ︸generative model
or p(t|x , θ)︸ ︷︷ ︸discriminative model
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)
Osvaldo Simeone ML for Comm 49 / 184
Model Selection
How to select a model (inductive bias)?
Use domain knowledge if available, e.g., to select features
Focus here on model order selection, i.e., selection of the capacity ofthe model
Ex.: For logistic regression,I Model order M: Number of features
Osvaldo Simeone ML for Comm 50 / 184
Model Selection
How to select a model (inductive bias)?
Use domain knowledge if available, e.g., to select features
Focus here on model order selection, i.e., selection of the capacity ofthe model
Ex.: For logistic regression,I Model order M: Number of features
Osvaldo Simeone ML for Comm 50 / 184
Model Selection
Example: Regression using a discriminative model p(t|x ,w)
M∑m=0
wmxm
︸ ︷︷ ︸t(x): polynomial of order M
+N (0, 1)
0 0.2 0.4 0.6 0.8 1-1.5
-1
-0.5
0
0.5
1
1.5
?
Osvaldo Simeone ML for Comm 51 / 184
Model Selection
With M = 1, using ML learning of the coefficients –
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
M= 1
Osvaldo Simeone ML for Comm 52 / 184
Model Selection: Underfitting...
With M = 1, the ML predictor t(x) underfits the data:I the model is not rich enough to capture the variations present in the
data;I large training loss
LD(θ) =1
N
N∑n=1
(tn − t(xn))2
Osvaldo Simeone ML for Comm 53 / 184
Model Selection
With M = 9, using ML learning of the coefficients –
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
= 9M
M= 1
Osvaldo Simeone ML for Comm 54 / 184
Model Selection: ... vs Overfitting
With M = 9, the ML predictor overfits the data:I the model is too rich and, in order to account for the observations in
the training set, it appears to yield inaccurate predictions outside it;I presumably we have a large generalization loss
Lp(t) = E(x,t)∼pxt [(t− t(x))2]
Osvaldo Simeone ML for Comm 55 / 184
Model Selection
M = 3 seems to be a resonable choice...
... but how do we know given that we have no data outside of thetraining set?
0 0.2 0.4 0.6 0.8 1-3
-2
-1
0
1
2
3
= 9M
M= 1
M= 3
Osvaldo Simeone ML for Comm 56 / 184
Model Selection: ValidationKeep some data (validation set) to estimate the generalization errorfor different values of M(See cross-validation for a more efficient way to use the data)
Osvaldo Simeone ML for Comm 57 / 184
Model Selection: ValidationValidation allows model order selection.
1 2 3 4 5 6 7 8 90
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
root
ave
rage
squ
ared
loss
training
generalization (via validation)
overfittingunderfitting
Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).
Osvaldo Simeone ML for Comm 58 / 184
Model Selection: ValidationValidation allows model order selection.
1 2 3 4 5 6 7 8 90
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
root
ave
rage
squ
ared
loss
training
generalization (via validation)
overfittingunderfitting
Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).
Osvaldo Simeone ML for Comm 58 / 184
Model Selection: ValidationModel order selection should depend on the amount of data...It is a problem of bias (asymptotic error) versus generalization gap.
0 10 20 30 40 50 60 700
0.2
0.4
0.6
0.8
1ro
ot a
vera
ge q
uadr
atic
loss
M
M
= 1
= 7
generalization (via validation)
training
Osvaldo Simeone ML for Comm 59 / 184
Application to Communication Networks
Fog network architecture [5GPPP]
Osvaldo Simeone ML for Comm 60 / 184
At the Edge: PHYChannel detection and decoding – classification
[Cammerer et al '17]
Osvaldo Simeone ML for Comm 61 / 184
Machine Learning in Communication Networks
Which tasks?I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit (depends)I large data sets exist or can be created XI the task requires does not require detailed explanations for how the
decision was made XI the task does not have stringent optimality requirements (depends)I the phenomenon or function being learned should not change rapidly
(depends)
Osvaldo Simeone ML for Comm 62 / 184
At the Edge: PHYChannel detection and decoding – classificationModel deficit
[Farsad and Goldsmith '18]
Osvaldo Simeone ML for Comm 63 / 184
At the Edge: PHYChannel equalization in the presence of non-linearities, e.g., foroptical links – regressionAlgorithm deficit
[:DQJ�HW�DO�µ��]Osvaldo Simeone ML for Comm 64 / 184
At the Edge: PHY
Channel equalization in the presence of non-linearities, e.g., forsatellite links with non-linear ampliers – regression
Algorithm deficit
[Bouchired HW�DO�¶��@
Osvaldo Simeone ML for Comm 65 / 184
At the Edge: PHY
Channel decoding for modulation schemes with complex optimaldecoders, e.g., continuous phase modulation – classification
Algorithm deficit
[De Veciana and Zakhor '92]
Osvaldo Simeone ML for Comm 66 / 184
Machine Learning in Communication Networks
Which tasks?I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit (depends)I large data sets exist or can be created XI the task requires does not require detailed explanations for how the
decision was made XI the task does not have stringent optimality requirements (depends)I the phenomenon or function being learned should not change rapidly
(depends)
Osvaldo Simeone ML for Comm 67 / 184
At the Edge: PHY
Channel decoding – classification
Leverage domain knowledge to choose the hypothesis class
[1DFKPDQL�HW�DO�µ��@
Osvaldo Simeone ML for Comm 68 / 184
At the Edge: PHYChannel equalization to compensate for hardware impairments –regressionLeverage domain knowledge to select blocks to be designed using amodel-based or a data-based approach
[Schibisch HW�DO�µ��@
Osvaldo Simeone ML for Comm 69 / 184
At the Edge: PHYModulation recognition – classificationAlgorithm deficit
[Agirman-Tosun et al '11]
Osvaldo Simeone ML for Comm 70 / 184
At the Edge: PHYLocalization – regressionModel deficit
(coordinates)
>)DQJ�DQG�/LQ�µ��@Osvaldo Simeone ML for Comm 71 / 184
At the Edge: PHYPrecoding and power allocation: Use known optimization algorithmto generate data set – regressionAlgorithm deficit
>6XQ�HW�DO�¶��@
Osvaldo Simeone ML for Comm 72 / 184
At the Edge: PHY
Approximate model and known optimization algorithm can be alsoused to generate data set for an initial training phase
Algorithm deficit
[Zappone al µ��@
Osvaldo Simeone ML for Comm 73 / 184
At the Edge: PHYInterference cancellation – regressionModel deficit
[Balatsoukas-6WLPPLQJ�µ��]Osvaldo Simeone ML for Comm 74 / 184
At the Edge: PHYFDD massive MIMO via channel prediction from uplink training (seealso [Huang et al ’19])Model deficit
[Arnold et al ’19]
Osvaldo Simeone ML for Comm 75 / 184
At the Edge: MAC/ LinkSpectrum sensing – classificationModel deficit
[Tumuluru et al '10]
Osvaldo Simeone ML for Comm 76 / 184
At the Edge: Network and ApplicationContent prediction for proactive caching – classificationModel deficit
[Chen et al '17]
Osvaldo Simeone ML for Comm 77 / 184
At the Cloud: Network
Link prediction for wireless routing – classification/ regression
Model deficit
[Wang et al 06]
Osvaldo Simeone ML for Comm 78 / 184
At the Cloud: NetworkLink prediction for optical routing – classification/ regressionModel deficit
[Musumeci HW�DO�¶��@Osvaldo Simeone ML for Comm 79 / 184
At the Cloud: Network
Congestion prediction for smart routing – classification
Model deficit
[Tang et al µ17]
Osvaldo Simeone ML for Comm 80 / 184
At the Cloud: Network and ApplicationTraffic classification – classificationModel deficit
[Nguyen et al '08]
Osvaldo Simeone ML for Comm 81 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 82 / 184
Unsupervised Learning
Unsupervised learning tasks operate over unlabelled data sets.
General goal: discover properties of the data, e.g., for compressedrepresentation
“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)
Osvaldo Simeone ML for Comm 83 / 184
Unsupervised Learning
Unsupervised learning tasks operate over unlabelled data sets.
General goal: discover properties of the data, e.g., for compressedrepresentation
“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)
Osvaldo Simeone ML for Comm 83 / 184
“Defining” Unsupervised Learning
Training set D:xn ∼
i.i.d.p(x), n = 1, ...,N
Goal: Learn some useful properties of the distribution p(x)
Alternative viewpoints to frequentist framework: Bayesian and MDL
Osvaldo Simeone ML for Comm 84 / 184
“Defining” Unsupervised Learning
Training set D:xn ∼
i.i.d.p(x), n = 1, ...,N
Goal: Learn some useful properties of the distribution p(x)
Alternative viewpoints to frequentist framework: Bayesian and MDL
Osvaldo Simeone ML for Comm 84 / 184
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone ML for Comm 85 / 184
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone ML for Comm 85 / 184
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone ML for Comm 85 / 184
Unsupervised Learning Tasks
Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers
Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)
Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks
Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films
Osvaldo Simeone ML for Comm 85 / 184
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone ML for Comm 86 / 184
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone ML for Comm 87 / 184
Models
Unsupervised learning models typically involve hidden or latentvariables.
zn = hidden, or latent, variables for each data point xn
Ex.: zn = cluster index of xn
Osvaldo Simeone ML for Comm 88 / 184
Models
Unsupervised learning models typically involve hidden or latentvariables.
zn = hidden, or latent, variables for each data point xn
Ex.: zn = cluster index of xn
Osvaldo Simeone ML for Comm 89 / 184
(a) Directed Generative Models
Model data x as being caused by z :
p(x |θ) =∑z
p(z |θ)p(x |z , θ)
Osvaldo Simeone ML for Comm 90 / 184
(a) Directed Generative Models
Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic
Basic representatives:I Mixture of GaussiansI Likelihood-free models
Osvaldo Simeone ML for Comm 91 / 184
(a) Directed Generative Models
Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic
Basic representatives:I Mixture of GaussiansI Likelihood-free models
Osvaldo Simeone ML for Comm 91 / 184
(d) Autoencoders
Model encoding from data to hidden variables, as well as decodingfrom hidden variables back to data:
p(z |x , θ) and p(x |z , θ)
Osvaldo Simeone ML for Comm 92 / 184
(d) Autoencoders
Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)
representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image
Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders
Osvaldo Simeone ML for Comm 93 / 184
(d) Autoencoders
Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)
representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image
Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders
Osvaldo Simeone ML for Comm 93 / 184
Unsupervised Learning
1. Model selection (inductive bias): Define a parametric modelp(x |θ)
2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ
3. Clustering, feature extraction, sample generation...
Osvaldo Simeone ML for Comm 94 / 184
Learning: Maximum Likelihood
Focus on directed generative models (a)
To simplify the notation, consider a single data point x (sum overdata set D to generalize).
ML problem:
maxθ
ln p(x |θ) = ln
(∑z
p(x , z |θ)
)= ln
(Ez∼p(z|θ)[p(z|x , θ)]
)Key issue: Need to marginalize over latent variables, whosedistribution is to be learned, in order to evaluate LL.
Osvaldo Simeone ML for Comm 95 / 184
Learning: Maximum Likelihood
Focus on directed generative models (a)
To simplify the notation, consider a single data point x (sum overdata set D to generalize).
ML problem:
maxθ
ln p(x |θ) = ln
(∑z
p(x , z |θ)
)= ln
(Ez∼p(z|θ)[p(z|x , θ)]
)Key issue: Need to marginalize over latent variables, whosedistribution is to be learned, in order to evaluate LL.
Osvaldo Simeone ML for Comm 95 / 184
ELBO
To tackle this issue, a standard approach is the introduction of avariational distribution (or encoder) q(z |x) that provides aprobabilistic estimate of z given x.
For any distribution q(z |x), the Evidence Lower BOund (ELBO) or(negative) free energy L(q, θ) is defined as
L(q, θ) =Ez∼q(z|x)[ln p(x , z|θ)− ln q(z|x)︸ ︷︷ ︸learning signal
]
Osvaldo Simeone ML for Comm 96 / 184
ELBO
To tackle this issue, a standard approach is the introduction of avariational distribution (or encoder) q(z |x) that provides aprobabilistic estimate of z given x.
For any distribution q(z |x), the Evidence Lower BOund (ELBO) or(negative) free energy L(q, θ) is defined as
L(q, θ) =Ez∼q(z|x)[ln p(x , z|θ)− ln q(z|x)︸ ︷︷ ︸learning signal
]
Osvaldo Simeone ML for Comm 96 / 184
ELBO
The ELBO is a global lower bound on the LL function
ln p(x |θ) ≥ L(q, θ),
Equality holds at a value θ0 if and only if the distribution q(z |x) isthe posterior of z given x (i.e., optimal Bayesian estimate)
q(z |x) = p(z |x , θ0).
LL
0
Osvaldo Simeone ML for Comm 97 / 184
Expectation-Maximization (EM) Algorithm
...
LL
newold
Osvaldo Simeone ML for Comm 98 / 184
Expectation-Maximization (EM) Algorithm
Initialize parameter vector θold.
For each iterationI E step: For fixed parameter vector θold,
maxqL(q, θold)→ qnew(z |x) = p(z |x , θold).
Estimate of the latent variables
I M step: For fixed variational distribution qnew(z |x),
maxθL(qnew, θ)→ max
θEz∼qnew(z|x) [ln p(x , z|θ)]
Solve a supervised learning problem
Osvaldo Simeone ML for Comm 99 / 184
Expectation-Maximization (EM) Algorithm
EM guarantees decreasing objective values, which ensuresconvergence to a local optimum of the original problem.
...
LL
newold
Osvaldo Simeone ML for Comm 100 / 184
Expectation-Maximization (EM) Algorithm
Osvaldo Simeone ML for Comm 101 / 184
Example: Mixture of Gaussians
Directed generative model:
z ∼ Bern(π)
x|z =k ∼ N(µk ,Σk)
Osvaldo Simeone ML for Comm 102 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 103 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 104 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 105 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 106 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 107 / 184
Example: Mixture of Gaussians
Osvaldo Simeone ML for Comm 108 / 184
Scaling EM
EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.
Solutions:I E step: Parametrize the variational distribution q(z |x , ϕ) and maximize
ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z|x) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps
Osvaldo Simeone ML for Comm 109 / 184
Scaling EM
EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.
Solutions:I E step: Parametrize the variational distribution q(z |x , ϕ) and maximize
ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z|x) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps
Osvaldo Simeone ML for Comm 109 / 184
Learning: Beyond Maximum Likelihood
ML tends to provide inclusive and “blurry” estimates of thedistribution of the data distribution.
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
This can be a problem for tasks such as data generation.
Osvaldo Simeone ML for Comm 110 / 184
Learning: Beyond Maximum Likelihood
ML can be proven to minimize the KL divergence
KL(pD(x)||p(x |θ)) = Ez∼pD(x)
[ln
pD(x)
p(x |θ)
]betwen the empirical distribution
pD(x) =N[x ]
N(with counts N[x ] = |{n : xn = x}|)
and the model.
Osvaldo Simeone ML for Comm 111 / 184
Learning: Beyond Maximum LikelihoodThe KL divergence is part of the larger class of f -divergences betweentwo distributions p(x) and q(x):
Df (p||q) = maxT (x)
Ex∼p[T (x)]− Ex∼q[g(T (x))],
for some concave increasing function g(·).
6:T;
T1L:T;
T1M:T;
L T �� 6 T �����
discriminator
M T �� 6 T �����
Osvaldo Simeone ML for Comm 112 / 184
Learning: Generative Adversarial Networks (GANs)
Generalizing the ML problem, GANs attempt to solve the problem
minθ
maxϕ
Ex∼pD [Tϕ(x)]− Ex∼p(x |θ)[g(Tϕ(x))]
for some differentiable function Tϕ(x) of the parameter vector ϕ.
Choice of the divergence (via the discriminator) is tailored to data.
Can be applied to likelihood-free models.
Osvaldo Simeone ML for Comm 113 / 184
Learning: Generative Adversarial Networks (GANs)
84 [NVIDIA]
Osvaldo Simeone ML for Comm 114 / 184
Applications to Communication Networks
Fog network architecture [5GPPP]
Osvaldo Simeone ML for Comm 115 / 184
At the Edge: PHYEnd-to-end encoding/decoding for wireless channels – autoencoders
[2¶6KHD�DQG�Hoydis ¶��@
Osvaldo Simeone ML for Comm 116 / 184
Machine Learning in Communication Networks
Which tasks?I traditional engineering flow not applicable or inadequate: model deficit
or algorithm deficit (depends)I large data sets exist or can be created XI the task requires does not require detailed explanations for how the
decision was made XI the task does not have stringent optimality requirements (depends)I the phenomenon or function being learned should not change rapidly
(depends)
Osvaldo Simeone ML for Comm 117 / 184
At the Edge: PHYEnd-to-end encoding/decoding for optical channels – autoencodersAlgorithm deficit
[Karanov HW�DO�o18]
Osvaldo Simeone ML for Comm 118 / 184
At the Edge: PHYEnd-to-end encoding/decoding for Gaussian channels with feedback –autoencoders based on Recurrent Neural Network (RNN)Algorithm deficit
>.LP�HW�DO�µ��@
Osvaldo Simeone ML for Comm 119 / 184
At the Edge: PHYJoint source-channel coding for image transmissionAlgorithm deficit
[Bourtsoulatze et al ’18]
Osvaldo Simeone ML for Comm 120 / 184
At the Edge: PHYChannel State Information (CSI) compression and feedback –autoencodersModel deficitRelated work: quantization of log-likelihoods [Arvinte et al ’19]
>:HQ�HW�DO�µ��@
Osvaldo Simeone ML for Comm 121 / 184
At the Edge: PHYFingerprinting for localization – autoencodersModel deficit
[Xiao et al '17]Osvaldo Simeone ML for Comm 122 / 184
At the Edge: PHYMimicking a propagation channel - GAN (see also [Ye et al ’18])Model deficit
[2¶6KHD�HW�DO�µ18]
Osvaldo Simeone ML for Comm 123 / 184
At the Edge: PHY
Mimicking and identifying a propagation channel (e.g., satellite) -generative models
Leveraging domain knowledge improves the learned model.
104 [Ibnkahla µ��@
Osvaldo Simeone ML for Comm 124 / 184
At the Edge: MAC/ LinkResource allocation – clusteringAlgorithm deficit
[Abdelnasser et al '14]
Osvaldo Simeone ML for Comm 125 / 184
At the Cloud: NetworkSelf-organizing multi-hop networks – clusteringAlgorithm deficit
[Abbassi and Younis '07]
Osvaldo Simeone ML for Comm 126 / 184
At the Cloud: NetworkAnomaly detection – density estimationModel deficit
[Musumeci HW�DO�¶��@
Osvaldo Simeone ML for Comm 127 / 184
Overview
What is Machine Learning and When to Use It
Data and Machine Learning in Communication Systems
Supervised Learning and Applications
Unsupervised Learning and Applications
Reinforcement Learning and Applications
Osvaldo Simeone ML for Comm 128 / 184
Reinforcement Learning
Feedback-based sequential decision making
[@ D. Silver]
st at
rt
Osvaldo Simeone ML for Comm 129 / 184
Reinforcement Learning
Osvaldo Simeone ML for Comm 130 / 184
Reinforcement Learning
Data collected while acting
Given an observation of the world (input) st at time t, the agenttakes an action (output) at ...
Ex.: st=current image; at ∈ {up,down,stay}
[Karpathy µ��@
Osvaldo Simeone ML for Comm 131 / 184
Reinforcement Learning
... and receives a reward signal (feedback) rt
Ex.: rt = 1 if ball goes past opponent; rt = −1 if misses ball; rt = 0otherwise
Current actions generally affect future states and rewards
[Karpathy µ��@
Osvaldo Simeone ML for Comm 132 / 184
Reinforcement Learning
... and receives a reward signal (feedback) rt
Ex.: rt = 1 if ball goes past opponent; rt = −1 if misses ball; rt = 0otherwise
Current actions generally affect future states and rewards
[Karpathy µ��@
Osvaldo Simeone ML for Comm 132 / 184
Defining Reinforcement Learning
Nç1 L:N�Oá=;
Oç>51 L Oñ Oá =
Oç
Oç>5
=ç
=ç1è: �Oç;
Oç>6
=ç>5 «
«
Osvaldo Simeone ML for Comm 133 / 184
Defining Reinforcement Learning
Goal: optimize policy π(a|s) = Pr[a = a|s = s] by maximizing theexpected return Eπ [Gt |st = s] with
Gt =∞∑τ=0
γτ rt+τ = rt + γrt+1 + γ2rt+2 + ...
Discount factor 0 < γ ≤ 1: preference for immediate reward if γ < 1
Osvaldo Simeone ML for Comm 134 / 184
Defining Reinforcement Learning
Problem defined by two “true” distributions:I Reward distribution given state and action: p(r |s, a)I State transition probability: p(s ′|s, a)
Osvaldo Simeone ML for Comm 135 / 184
Defining Reinforcement Learning
Nç1 L:N�Oá=;
Oç>51 L Oñ Oá =
Oç
Oç>5
=ç
=ç1è: �Oç;
Oç>6
=ç>5 «
«
Osvaldo Simeone ML for Comm 136 / 184
Defining Reinforcement Learning
When the true distributions are known, we have a Markov DecisionProcess (MDP)
When it is not, we have a reinforcement learning problem
Reinforcement learning:I Model-based: Learn a model (p(r |s, a, θ), p(s ′|s, a, θ)) and then solve
as MDPI Model-free: Learn directly the policy, i.e., how to act
Osvaldo Simeone ML for Comm 137 / 184
Defining Reinforcement Learning
When the true distributions are known, we have a Markov DecisionProcess (MDP)
When it is not, we have a reinforcement learning problem
Reinforcement learning:I Model-based: Learn a model (p(r |s, a, θ), p(s ′|s, a, θ)) and then solve
as MDPI Model-free: Learn directly the policy, i.e., how to act
Osvaldo Simeone ML for Comm 137 / 184
Methodology: Generalized Policy Iteration
Solution methods for both MDP and model-free reinforcementlearning generally alternate between two steps:
I Policy evaluation: Estimate the performance of the either the currentpolicy (on-policy learning) or of the current optimal policy (off-policylearning)
I Control: Act and update the policy to increase the expected return
Osvaldo Simeone ML for Comm 138 / 184
Methodology: Generalized Policy Iteration
Solution methods for both MDP and model-free reinforcementlearning generally alternate between two steps:
I Policy evaluation: Estimate the performance of the either the currentpolicy (on-policy learning) or of the current optimal policy (off-policylearning)
I Control: Act and update the policy to increase the expected return
Osvaldo Simeone ML for Comm 139 / 184
Methodology: Policy Evaluation
Policy evaluation: Action-value functionQπ(s, a) = Eπ [Gt |st = s, at = a]...
Osvaldo Simeone ML for Comm 140 / 184
Methodology: Policy EvaluationPolicy evaluation: Action-value functionQπ(s, a) = Eπ [Gt |st = s, at = a]...
[Mnih et al '17]Osvaldo Simeone ML for Comm 141 / 184
Methodology: Policy EvaluationPolicy evaluation: ... and/or value functionVπ(s) = Eπ [Gt |st = s] = Eπ [Q(s, a)|st = s]...
Va
lue
(V
)
Osvaldo Simeone ML for Comm 142 / 184
Methodology: Policy EvaluationSpectrum of solutions for policy evaluation:
I breadth and depth of the exploration of the state-action tree
TD(�)Osvaldo Simeone ML for Comm 143 / 184
Methodology: Policy EvaluationExploration in breadth generally requires the availability of a modelModel-free methods are based on Temporal-Difference (TD) and/orMonte Carlo
TD(λ)
model-based
model-free
(sample-based)
Osvaldo Simeone ML for Comm 144 / 184
Methodology: Policy Evaluation
Monte Carlo (offline): Q(st , at) ≈ rt + γrt+1 + γ2rt+2 + ...
TD (online): Q(st , at) ≈ rt + γQ(st+1, at+1)
These techniques are on-policy and there are off-policy variants
TD(�)
Osvaldo Simeone ML for Comm 145 / 184
Methodology: Policy Evaluation
Monte Carlo (offline): Q(st , at) ≈ rt + γrt+1 + γ2rt+2 + ...
TD (online): Q(st , at) ≈ rt + γQ(st+1, at+1)
These techniques are on-policy and there are off-policy variants
TD(�)
Osvaldo Simeone ML for Comm 145 / 184
Methodology: Policy Evaluation
Monte Carlo (offline): Q(st , at) ≈ rt + γrt+1 + γ2rt+2 + ...
TD (online): Q(st , at) ≈ rt + γQ(st+1, at+1)
These techniques are on-policy and there are off-policy variants
TD(�)
Osvaldo Simeone ML for Comm 145 / 184
Methodology: Generalized Policy Iteration
Solution methods for both MDP and model-free reinforcementlearning generally alternate between two steps:
I Policy evaluation: Estimate the performance of the either the currentpolicy (on-policy learning) or of the current optimal policy (off-policylearning)
I Control: Act and update the policy to increase the expected return
Osvaldo Simeone ML for Comm 146 / 184
Methodology: ControlHow to choose actions? Exploration vs exploitation
I Ex: Multi-armed banditI Exploration: choose a random action with probability ε; Gibbs random
policies
N51L:N�= L s; N61L:N�= L t; N71L:N�= L u;
Osvaldo Simeone ML for Comm 147 / 184
Methodology: ControlHow to choose actions? Exploration vs exploitation
I Ex: Multi-armed banditI Exploration: choose a random action with probability ε; Gibbs random
policies
N51L:N�= L s; N61L:N�= L t; N71L:N�= L u;
Osvaldo Simeone ML for Comm 147 / 184
Methodology: Control
How to improve a policy?I greedy: maxaQπ(s, a)I gradient-based: policy gradient
Osvaldo Simeone ML for Comm 148 / 184
Methodology
Value vs policy-based methods:I value-based: policy evaluation and control via action-value function
Q(s, a)I policy-based: update an explicit policy π(a|s)I actor-critic: policy evaluation via action-value function Q(s, a) and
control via policy π(a|s)
Osvaldo Simeone ML for Comm 149 / 184
Example: Deep Q LearningValue-based – action-value function Q(s, a) parameterized via aneural networkPolicy evaluation: Off-policy TDControl: ε−greedy
[Mnih et al '17]Osvaldo Simeone ML for Comm 150 / 184
Example: Advantage Actor CriticActor-critic – action-value function Q(s, a) and policy π(a|s)parameterized via neural networksPolicy evaluation: On/Off-policy TDControl: policy gradient
[Karpathy µ��@Osvaldo Simeone ML for Comm 151 / 184
Applications to Communication Networks
Fog network architecture [5GPPP]
Osvaldo Simeone ML for Comm 152 / 184
At the Edge: PHYEnd-to-end encoding/decoding training in the absence of a channelmodelModel deficit
[Aoudia and Hoydis µ��@
Osvaldo Simeone ML for Comm 153 / 184
At the Edge: PHYIteratively trains decoder using supervised learning and encoder usingreinforcement learning (policy gradient)Requires receiver-to-transmitter feedback – impact of noise [Goutayet al ’18]
[Aoudia and Hoydis µ��@
Osvaldo Simeone ML for Comm 154 / 184
At the Edge: PHYLearning to act and to communicate (binary signals on binarysymmetric channels) to carry out collaborative tasksAlgorithm deficitLearned communication carries out compression and unequal errorprotection
[Mostaani and Simeone '18]Osvaldo Simeone ML for Comm 155 / 184
At the Edge: MAC/LinkBeamforming-user assignment in mmWave communicationsModel deficitRequires feedback
[Klautau HW�DO�¶��@Osvaldo Simeone ML for Comm 156 / 184
At the Edge: MAC/LinkAnti-jamming channel and base station selectionModel deficitRequires feedbackSimilar applications in cognitive radio networks [Li et al ’17]
>+DQ�HW�DO�¶��@
Osvaldo Simeone ML for Comm 157 / 184
At the Edge: MAC/LinkLearning how to allocate random access opportunitiesModel and algorithm deficit
[Jiang et al '18]
Osvaldo Simeone ML for Comm 158 / 184
At the Edge: MAC/LinkLearning to act and to schedule to carry out collaborative tasksAlgorithm deficit
[Kim et al '19]
Osvaldo Simeone ML for Comm 159 / 184
At the Edge: Network
Handover and base station selection for access
Model deficit
[Hasan et al ‘13]
Osvaldo Simeone ML for Comm 160 / 184
At the Edge: Network
On-off scheduling of base stations
Model deficit
[Xu et al '17]
on/off state
& users’ demands
tx, circuitry and
transition power
on/off scheduling
& beamforming
Osvaldo Simeone ML for Comm 161 / 184
At the Edge: Application
Caching on wireless devices based on channel conditions and contentlifetime
Model deficit
[Somuyiwa et al '18]
Osvaldo Simeone ML for Comm 162 / 184
At the Cloud: NetworkPacket routing in a dynamically changing networkModel deficit
[Boyan and Littman '94]
Osvaldo Simeone ML for Comm 163 / 184
Concluding Remarks
Machine learning tools can leverage the availability of data andcomputing resources in modern communication systems.
Supervised, unsupervised and reinforcement learning paradigms lendthemselves to different key communication (sub)tasks.
Not a universal solution – case by case analysis of advantages anddisadvantages.
Osvaldo Simeone ML for Comm 164 / 184
Concluding Remarks
Engineering the integration of traditional model-based techniques anddata-driven machine learning methods
>5HLFK�µ��@
Osvaldo Simeone ML for Comm 165 / 184
Concluding Remarks
Additional topics, e.g., adversarial (white-box) attacks
[Sadeghi and Larsson ’19]
Osvaldo Simeone ML for Comm 166 / 184
Concluding Remarks
Additional topics, e.g., meta-learning (learning the inductive bias fromother tasks)
[Park et al’19]
Osvaldo Simeone ML for Comm 167 / 184
Concluding Remarks
Additional topics, e.g., neuromorphic computing [Jang et al ’19]
[Rajendran '18]
Osvaldo Simeone ML for Comm 168 / 184
Concluding Remarks
How to integrate machine learning analytics and control with protocolstack?
[China Mobile]
Osvaldo Simeone ML for Comm 169 / 184
For More...
O. Simeone, “A Brief Introduction to Machine Learning forEngineers,” Foundations and Trends in Signal Processing, 2018.
O. Simeone, “A Very Brief Introduction to Machine Learning withApplications to Communication Systems,” IEEE Transactions onCognitive Communications and Networking, 2019.
Osvaldo Simeone ML for Comm 170 / 184
Acknowledgements
This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation
programme (grant agreement No. 725731).
Osvaldo Simeone ML for Comm 171 / 184
References
[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction toMachine Learning Communications Systems, 2017[Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. ScalingDeep Learning-based Decoding of Polar Codes via Partitioning. arXivpreprint arXiv:1702.06901. 2017 Feb 22.[Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear DigitalSelf-Interference Cancellation for In-Band Full-Duplex Radios Using NeuralNetworks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND.Learning to Optimize: Training Deep Neural Networks for WirelessResource Management. arXiv preprint arXiv:1705.09412. 2017 May 26.[de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone,Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017.[Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learningfor Massive MIMO CSI Feedback arXiv.
Osvaldo Simeone ML for Comm 172 / 184
References[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulationclassification of MIMO-OFDM signals by independent component analysisand support vector machines.” in Proc. Asilomar 2011.[Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based ondiscriminant-adaptive neural network in IEEE 802.11 environments. IEEETransactions on Neural networks. 2008 Nov;19(11):1973-8.[Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural networkbased spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010.[Chen et al ’17] Chen M, et a;. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users. IEEE Transactionson Wireless Communications. 2017.[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE IndoorLocalization Based on Denoising Autoencoder. IEEE Access.2017;5:12751-60.[Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resourceallocation for dense femtocells in a two-tier cellular OFDMA network.IEEE Transactions on Wireless Communications, 2014.
Osvaldo Simeone ML for Comm 173 / 184
References
[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques forinternet traffic classification using machine learning. IEEECommunications Surveys & Tutorials. 2008 Oct 1;10(4):56-76.[Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "Asupervised learning approach for routing optimizations in wireless sensornetworks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006.[Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clusteringalgorithms for wireless sensor networks. Computer communications, 2007.[Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in thestochastic block model. IEEE Transactions on Information Theory, 2016.[Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systemsvia Asynchronous Multi-User Deep Reinforcement Learning, arxiv.[Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a MachineLearning Detector Based on k-Nearest Neighbors. IEEE PhotonicsTechnology Letters. 2016.
Osvaldo Simeone ML for Comm 174 / 184
References[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidthsharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010.[Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality ofmyopic scheduling and whittle indexability for energy harvesting sensors. inProc. CISS 2012.[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based frameworkfor power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC2017.[Mnih et al ’15] Mnih V, et al. Human-level control through deepreinforcement learning. Nature, 2015.[Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques forNext-Generation Context-Aware Wireless Networks, arxiv.[Tang et al ’17] F. Tang et al., "On Removing Routing Protocol fromFuture Wireless Networks: A Real-time Deep Learning Approach forIntelligent Traffic Control," IEEE Wireless Communications, 2018.[Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTEoperation in unlicensed bands," in Proc. IEEE ICC 2015.
Osvaldo Simeone ML for Comm 175 / 184
References[Kato et al ’17] N. Kato et al., "The Deep Learning Vision forHeterogeneous Network Traffic Control: Proposal, Challenges, and FuturePerspective," IEEE Wireless Communications, June 2017.[Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto,In-network Neural Networks, arXiv:1801.05731.[He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-basedResource Management in Software-Defined and Virtualized Vehicular AdHoc Networks, in Proc. ACM SDAIVN 2017.[Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, NeuralNetwork Detection of Data Sequences in Communication Systems,arXiv:1802.02046[Emigh et al ’15] Emigh, Matthew, et al. "A model based approach toexploration of continuous-state MDPs using Divergence-to-Go." in Proc.IEEE Machine Learning for Signal Processing (MLSP), 2015.[Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, BlindChannel Equalization using Variational Autoencoders, in Proc. IEEE ICCworkshop, 2018.
Osvaldo Simeone ML for Comm 176 / 184
References
[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, AvishekNag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore,Networking and Internet Architecture A Survey on Application of MachineLearning Techniques in Optical Networks, arXiv:1803.07976[Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-BasedFuture Received Signal Strength Prediction Using Depth Images formmWave CommunicationsarXiv:1804.00709.[Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu,Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC2018.[Aoudia and Hoydis ’18] Model Fayı¿œal Ait Aoudia, Jakob Hoydis,End-to-End Learning of Communications Systems Without a Channel,arXiv:1804.02276.[Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning ofOptical Fiber Communications, arXiv:1804.04097.
Osvaldo Simeone ML for Comm 177 / 184
References
[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West,“Approximating the Void: Learning Stochastic Channel Models fromObservation with Variational Generative Adversarial Networks”,arXiv:1805.06350.[Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning forNetwork Slicing”, arXiv:1805.06591.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Bouchired et al ’98] Bouchired S, Roviras D, Castaniı¿œ F. Equalisationof satellite mobile channels with neural network techniques. SpaceCommunications. 1998.[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-basedcontinuous phase modulation receivers. IEEE transactions oncommunications. 1992.
Osvaldo Simeone ML for Comm 178 / 184
References
[Schibisch et al ‘18] Stefan Schibisch, et al,“Online Label Recovery forDeep Learning-based Communication through Error Correcting Codes,”.arXiv:1807.00747.[Ye et al ’18] Hao Ye, et al, “Channel Agnostic End-to-End Learning basedCommunication Systems with Conditional GAN,” arXiv:1807.00447.[Kim et al ’18] Hyeji Kim, et al “Deepcode: Feedback Codes via DeepLearning,” arXiv:1807.00801.[Shu et al ’18] Shu J., Xu, Z., and Meng, D, “Small Sample Learning inBig Data Era”, arXiv:1808.04572.[Zappone et al ’18] A. Zappone, M. Di Renzo, M. Debbah, T. T. Lam,and X. Qian, “Model-Aided Wireless Artificial Intelligence: EmbeddingExpert Knowledge in Deep Neural Networks Towards Wireless SystemsOptimization" arXiv:1808.01672.
Osvaldo Simeone ML for Comm 179 / 184
References
[Farsad et al ’18] Farsad N, Rao M, Goldsmith A. Deep Learning for JointSource-Channel Coding of Text. arXiv preprint arXiv:1802.06832. 2018Feb 19.[Bourtsoulatze et al ’18] Bourtsoulatze E, Kurka DB, Gunduz D. DeepJoint Source-Channel Coding for Wireless Image Transmission. arXivpreprint arXiv:1809.01733. 2018 Sep 4.[Luong et al ’18] Luong NC, et a;, Applications of Deep ReinforcementLearning in Communications and Networking: A Survey. arXiv preprintarXiv:1810.07862. 2018 Oct 18.[Aoudia and Hoydis ‘18] FA Aoudia and J. Hoydis, End-to-End Learning ofCommunications Systems Without a Channel Model, arXiv:1804.02276v2,2018.[Goutay et al ’18] Goutay M, Aoudia FA, Hoydis J. Deep ReinforcementLearning Autoencoder with Noisy Feedback. arXiv preprintarXiv:1810.05419. 2018 Oct 12.
Osvaldo Simeone ML for Comm 180 / 184
References
[Klatau et al ’16] Klautau A, et al. 5G MIMO Data for Machine Learning:Application to Beam-Selection using Deep Learning. InProc. Inf. Theoryand Appl. Workshop (ITA) 2016 Jan.[Han et al ’17] G. Han, L. Xiao, and H. V. Poor, “Two-dimensionalanti-jamming communication based on deep reinforcement learning,” inProceedings of the 42nd IEEE International Conference on Acoustics,Speech and Signal Processing, , 2017.[Li et al ’17] X. Li, J. Fang, W. Cheng, H. Duan, Z. Chen, and H. Li,“Intelligent power control for spectrum sharing in cognitive radios: A deepreinforcement learning approach,” arXiv preprint arXiv:1712.07365 , 2017.[Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codesusing deep learning." Allerton 2016.
Osvaldo Simeone ML for Comm 181 / 184
References
[Choi et al ’18] Choi K, Tatwawadi K, Weissman T, Ermon S. NECST:Neural Joint Source-Channel Coding. arXiv preprint arXiv:1811.07557.2018 Nov 19.[Jiang et al ’18] Yihan Jiang, Hyeji Kim, Himanshu Asnani, SreeramKannan, Sewoong Oh, Pramod Viswanath, ”LEARN Codes: InventingLow-latency Codes via Recurrent Neural Networks,” arXiv:1811.12707.[Mostaani and Simeone ’18] Arsham Mostaani, Osvaldo Simeone, ”OnLearning How to Communicate Over Noisy Channels for CollaborativeTasks,” arXiv:1810.01155.[Park et al ’19] Jihong Park, Sumudu Samarakoon, Mehdi Bennis,Merouane Debbah, ”Wireless Network Intelligence at the Edge”,arXiv:1812.02858.[Kim et al ’19] Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, YiY. Learning to Schedule Communication in Multi-agent ReinforcementLearning, ICLR 2019.
Osvaldo Simeone ML for Comm 182 / 184
References
[Boyan and Littman ’94] Boyan JA, Littman ML. Packet routing indynamically changing networks: A reinforcement learning approach.InAdvances in neural information processing systems 1994 (pp. 671-678).[Somuyiwa et al ’18] Somuyiwa SO, Gyorgy A, Gunduz D. AReinforcement-Learning Approach to Proactive Caching in WirelessNetworks. IEEE Journal on Selected Areas in Communications. 2018 Jun7.[Sadeghi and Larsson ’19] Meysam Sadeghi and Erik G. Larsson, PhysicalAdversarial Attacks Against End-to-End Autoencoder CommunicationSystems, arxiv, 2019.[Huang et al ’19] Chongwen Huang, et al, “Deep Learning for UL/DLChannel Calibration in Generic Massive MIMO Systems”, ICC WCCSymposium 2019.[Arvinte et al ’19] Marius Arvinte, et al, Deep Log-Likelihood RatioQuantization, arXiv:1903.04656
Osvaldo Simeone ML for Comm 183 / 184
References
[Park et al ’19] Sangwoo Park, Hyeryung Jang, Osvaldo Simeone,Joonhyuk Kang, Learning How to Demodulate from Few Pilots viaMeta-Learning, arXiv:1903.02184.[Jang et al ’19] H Jang, O Simeone, B Gardner, A Gruning, Spiking NeuralNetworks: A Stochastic Signal Processing Perspective, arXiv:1812.03929.[Jiang et al ’18] Jiang N, Deng Y, Simeone O, Nallanathan A. CooperativeDeep Reinforcement Learning for Multiple Groups NB-IoT NetworksOptimization. arXiv preprint arXiv:1810.11729. 2018 Oct 27.
Osvaldo Simeone ML for Comm 184 / 184