when can machine learning be useful for communication … · goals and learning outcomes goals: i...

When Can Machine Learning Be Useful forCommunication Systems?

Osvaldo Simeone

King’s College London

Osvaldo Simeone ML for Comm 1 / 184

Goals and Learning Outcomes

Goals:I Provide an introduction to main areas in machine learningI Offer pointers to specific applications for telecom

Learning outcomes:I Recognize scenarios in which machine learning can and cannot be usefulI Identify specific classes of machine learning methods that apply to a

given problem


Overview

What is Machine Learning and When to Use It

Data and Machine Learning in Communication Systems

Supervised Learning and Applications

Unsupervised Learning and Applications

Reinforcement Learning and Applications


Overview







What is Machine Learning?Traditional engineering (model-based) approach:

I Acquisition of domain knowledge...



I ... mathematical (physics-based) modelling...



I ... and optimized algorithm design with performance guarantees (underthe given model)


What is Machine Learning?

Engineering design flow

acquisition of domain knowledge

algorithm development

physics-based mathematical model

algorithm with performance guarantees

acquisition of data

learning

training set

black-boxmachine

hypothesis class

(b)Osvaldo Simeone ML for Comm 8 / 184

What is Machine Learning?Machine learning approach:

I Selection of a general-purpose model (hypothesis class)



I Selection of a general-purpose model (hypothesis class), objectivefunction, and a learning algorithm...



I ... learning based on data (examples)



I ... and use of the trained (black-box) machine for inference



Machine learning or data-driven approach


algorithm development

physics-based mathematical model

algorithm with performance guarantees

acquisition of data

learning

training set

black-boxmachine

hypothesis class

(a)



Integrating domain knowledge into a machine learning approachI Choose hypothesis class and learning algorithm based on knowledge of

the problem


acquisition of data

learning

training set

black-boxmachine

hypothesis class


Taxonomy of Machine Learning Methods

Supervised learning

Unsupervised learning

Reinforcement learning



Supervised vs unsupervised learning



Reinforcement learning: feedback-based sequential decision making

[@ D. Silver]

st at

rt


When to Use Machine Learning?

Potential advantages:I lower cost and faster development

F if collecting data is less expensive/ quicker than acquiring aphysics-based model or an optimal algorithm

I reduced implementation complexityF if hypothesis class contains low-complexity machines

Potential disadvantagesI suboptimal performance and limited performance guaranteesI limited interpretability



Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate:

F model deficit: a physics-based model is not availableF algorithm deficit: a model is available, but optimal or near-optimal

algorithms are not known or too complex to implement



Criteria inspired by [Brynjolfsson and Mitchell ’17]:I traditional engineering flow not applicable or inadequate: model deficit

or algorithm deficit

I large data sets exist or can be created

I the task requires does not require detailed explanations for how thedecision was made

I the task does not have stringent optimality requirementsF Monte Carlo for algorithm deficit and generalization bounds for model

deficit

I the phenomenon or function being learned should not change rapidly


Overview







Data in Communication Networks

Fog network architecture [5GPPP]




Data collection and processing can take place at the edge and/or atthe cloud.



Data at the edge:I PHY: Baseband signals, (multi-RAT) channel qualityI MAC/ Link: Throughput, FER, random access load and latencyI Network: Location, traffic loads across services, users’ device types,

battery levelsI Application: Users’ preferences, content demands, computing loads,

QoS metrics



Data at the cloud:I Network: Mobility patterns, network-wide traffic statistics, outage ratesI Application: User’s behavior patterns, subscription information, service

usage statistics, TCP/IP traffic statistics


Machine Learning in Communication Networks

Which tasks?I traditional engineering flow not applicable or inadequate: model deficit

or algorithm deficit (depends)I large data sets exist or can be created XI the task requires does not require detailed explanations for how the

decision was made XI the task does not have stringent optimality requirements (depends)I the phenomenon or function being learned should not change rapidly

(depends)


Machine Learning in Communication Networks3GPP has introduced Network Data Analytics to enable access todata from the network functions and services in the core networkETSI has proposed use cases for Experiential Network Intelligence(ENI) based on such data

[China Mobile] [ETSI]Osvaldo Simeone ML for Comm 28 / 184

Machine Learning for Communication vs Communicationfor Machine Learning

This presentation will discuss machine learning for communicationCommunication plays a key role in distributed learning systems,including edge and federated learning

[Park et al '19]


Overview







Supervised Learning

Supervised learning:I regression: continuous labelsI classification: discrete labels


Supervised Learning: Regression

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5

?

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (continuous)Goal: Predict the label t for a new, that is, as of yet unobserved,domain point x


Supervised Learning: Classification

4 5 6 7 8 90.5

1

1.5

2

2.5

3

3.5

4

4.5

?

Training set D: N training points (xn, tn), n = 1, ...,Nxn = covariates, domain points, or explanatory variablestn = dependent variables, labels, or responses (discrete)Goal: Predict the label (class) t for a new, that is, as of yetunobserved, domain point x


Supervised Learning

Impossible task without assuming a model (inductive bias) by the nofree lunch theorem

Memorizing vs. learning: Retrieval of a value tn corresponding to analready observed pair (xn, tn) ∈ D vs. predict the value t for anunseen x


Defining Supervised Learning

Training set D:

(xn, tn) ∼i.i.d.

p(x , t), n = 1, ...,N

Based on the training set D, we derive a predictor t(x).

Test pair:(x, t) ∼

indep. of Dp(x , t)

Quality of the prediction t(x) for a pair (x , t)

`(t, t(x))

for some loss function `(t, t), e.g., `(t, t) = (t − t)2 (quadratic) or`(t, t) = 1(t 6= t) (probability of error)


Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss)

Lp(t) = E(x,t)∼pxt [`(t, t(x))]

Alternative viewpoints to frequentist framework: Bayesian andMinimum Description Length (MDL)


When the True Distribution p(x , t) is Known...

... we don’t need data D

... and we have a standard inference problem, i.e., estimation(regression) or detection (classification).

The solution can be directly computed from the posterior orpredictive distribution

p(t|x) =p(x , t)

p(x)

as

t∗(x) = argmint

Et∼pt|x [`(t, t)|x ]


When the Model p(x , t) is Known...

With quadratic loss, conditional mean: t∗(x) = Et∼pt|x [t|x ]

With probability of error, maximum a posteriori (MAP):t∗(x) = argmaxt p(t|x)

Example: with joint distribution

x\t 0 1

0 0.05 0.45

1 0.4 0.1

, we have

p(t = 1|x = 0) = 0.9

and

t∗(x = 0) = 0.9× 1 + 0.1× 0 = 0.9 for quadratic loss,

t∗(x = 0) = 1 for probability of error (MAP)

.


When the True Distribution p(x , t) is Not Known...

... we need data D

... and we have a learning problem

1. Model selection (inductive bias): Define a parametric model

p(x , t|θ)︸︷︷︸generative model

or p(t|x , θ)︸︷︷︸discriminative model

2. Learning: Given data D, optimize a learning criterion to obtainthe parameter vector θ

3. Inference: Use model to obtain the predictor t(x) (to be testedon new data)


Logistic Regression

Example: Binary classification (t ∈ {0, 1})1. Model selection (inductive bias): logistic regression(discriminative model)

φ(x) = [φ1(x) · · ·φD′(x)]T is a vector of features (e.g., bag-of-wordsmodel for a text).


Logistic Regression

Parametric probabilistic model:

p(t = 1|x ,w) = σ(wTφ(x))

where σ(a) = (1 + exp(−a))−1 is the sigmoid function.


Logistic Regression2. Learning: To be discussed3. Inference: With probability of error loss, MAP classification

wTφ(x)︸︷︷︸logit or LLR

t=1≷t=0

0


Multi-Layer Neural Networks

1. Model selection (inductive bias): multi-layer neural network(discriminative model)

Multiple layers of learnable weights enable feature learning.


Supervised Learning







Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated theobserved training set D:

maximize p(D|θ)

⇐⇒maximize ln p(D|θ) (log-likelihood, or LL)

⇐⇒minimize − ln p(D|θ) (negative log-likelihood, or NLL)

Also known as log-loss, or cross-entropy for discriminative models

Requires sum over training points, e.g., for discriminative models:

minimize − ln p(tD|xD, θ) = −N∑

n=1

ln p(tn|xn, θ)


Learning: Maximum LikelihoodThe problem rarely has analytical solutions and is typically addressedby Stochastic Gradient Descent (SGD).For discriminative models, we have

θnew ← θold + γ∇θ ln p(tn|xn, θ)|θ=θold

γ is the learning rate.With multi-layer neural networks, this approach yields thebackpropagation algorithm.



Maximum likelihood is only concerned with the performance on thetraining set.

To improve generalization (performance outside the training set),regularization is typically used.

Regularization can be generally interpreted as imposing some priorknowledge on the model parameters θ.


Supervised Learning







Model Selection

How to select a model (inductive bias)?

Use domain knowledge if available, e.g., to select features

Focus here on model order selection, i.e., selection of the capacity ofthe model

Ex.: For logistic regression,I Model order M: Number of features


Model Selection

Example: Regression using a discriminative model p(t|x ,w)

M∑m=0

wmxm

︸︷︷︸t(x): polynomial of order M

+N (0, 1)

0 0.2 0.4 0.6 0.8 1-1.5

-1

-0.5

0

0.5

1

1.5

?


Model Selection

With M = 1, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

M= 1


Model Selection: Underfitting...

With M = 1, the ML predictor t(x) underfits the data:I the model is not rich enough to capture the variations present in the

data;I large training loss

LD(θ) =1

N

N∑n=1

(tn − t(xn))2


Model Selection

With M = 9, using ML learning of the coefficients –

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1


Model Selection: ... vs Overfitting

With M = 9, the ML predictor overfits the data:I the model is too rich and, in order to account for the observations in

the training set, it appears to yield inaccurate predictions outside it;I presumably we have a large generalization loss

Lp(t) = E(x,t)∼pxt [(t− t(x))2]


Model Selection

M = 3 seems to be a resonable choice...

... but how do we know given that we have no data outside of thetraining set?

0 0.2 0.4 0.6 0.8 1-3

-2

-1

0

1

2

3

= 9M

M= 1

M= 3


Model Selection: ValidationKeep some data (validation set) to estimate the generalization errorfor different values of M(See cross-validation for a more efficient way to use the data)


Model Selection: ValidationValidation allows model order selection.

1 2 3 4 5 6 7 8 90

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

root

ave

rage

squ

ared

loss

training

generalization (via validation)

overfittingunderfitting

Validation can also be used more generally to select otherhyperparameters (e.g., learning rate).


Model Selection: ValidationModel order selection should depend on the amount of data...It is a problem of bias (asymptotic error) versus generalization gap.

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1ro

ot a

vera

ge q

uadr

atic

loss

M

M

= 1

= 7

generalization (via validation)

training


Application to Communication Networks



At the Edge: PHYChannel detection and decoding – classification

[Cammerer et al '17]






(depends)


At the Edge: PHYChannel detection and decoding – classificationModel deficit

[Farsad and Goldsmith '18]


At the Edge: PHYChannel equalization in the presence of non-linearities, e.g., foroptical links – regressionAlgorithm deficit

[:DQJ�HW�DO�µ��]Osvaldo Simeone ML for Comm 64 / 184

At the Edge: PHY

Channel equalization in the presence of non-linearities, e.g., forsatellite links with non-linear ampliers – regression

Algorithm deficit

[Bouchired HW�DO�¶��@


At the Edge: PHY

Channel decoding for modulation schemes with complex optimaldecoders, e.g., continuous phase modulation – classification

Algorithm deficit

[De Veciana and Zakhor '92]






(depends)


At the Edge: PHY

Channel decoding – classification

Leverage domain knowledge to choose the hypothesis class

[1DFKPDQL�HW�DO�µ��@


At the Edge: PHYChannel equalization to compensate for hardware impairments –regressionLeverage domain knowledge to select blocks to be designed using amodel-based or a data-based approach

[Schibisch HW�DO�µ��@


At the Edge: PHYModulation recognition – classificationAlgorithm deficit

[Agirman-Tosun et al '11]


At the Edge: PHYLocalization – regressionModel deficit

(coordinates)

>)DQJ�DQG�/LQ�µ��@Osvaldo Simeone ML for Comm 71 / 184

At the Edge: PHYPrecoding and power allocation: Use known optimization algorithmto generate data set – regressionAlgorithm deficit

>6XQ�HW�DO�¶��@


At the Edge: PHY

Approximate model and known optimization algorithm can be alsoused to generate data set for an initial training phase

Algorithm deficit

[Zappone al µ��@


At the Edge: PHYInterference cancellation – regressionModel deficit

[Balatsoukas-6WLPPLQJ�µ��]Osvaldo Simeone ML for Comm 74 / 184

At the Edge: PHYFDD massive MIMO via channel prediction from uplink training (seealso [Huang et al ’19])Model deficit

[Arnold et al ’19]


At the Edge: MAC/ LinkSpectrum sensing – classificationModel deficit

[Tumuluru et al '10]


At the Edge: Network and ApplicationContent prediction for proactive caching – classificationModel deficit

[Chen et al '17]


At the Cloud: Network

Link prediction for wireless routing – classification/ regression

Model deficit

[Wang et al 06]


At the Cloud: NetworkLink prediction for optical routing – classification/ regressionModel deficit

[Musumeci HW�DO�¶��@Osvaldo Simeone ML for Comm 79 / 184

At the Cloud: Network

Congestion prediction for smart routing – classification

Model deficit

[Tang et al µ17]


At the Cloud: Network and ApplicationTraffic classification – classificationModel deficit

[Nguyen et al '08]


Overview







Unsupervised Learning

Unsupervised learning tasks operate over unlabelled data sets.

General goal: discover properties of the data, e.g., for compressedrepresentation

“Some of us see unsupervised learning as the key towards machineswith common sense.” (Y. LeCun)


“Defining” Unsupervised Learning

Training set D:xn ∼

i.i.d.p(x), n = 1, ...,N

Goal: Learn some useful properties of the distribution p(x)

Alternative viewpoints to frequentist framework: Bayesian and MDL


Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators,compression algorithms, to detect outliers

Clustering: partition all points in D in groups of similar objects (e.g.,document clustering)

Dimensionality reduction, representation and feature extraction:represent each data points xn in a space of lower dimensionality, e.g.,to highlight independent explanatory factors, and/or to easevisualization, interpretation, or successive tasks

Generation of new samples: learn a machine that produces samplesapproximately distributed according to p(x), e.g., to produce artificialscenes based for games or films



1. Model selection (inductive bias): Define a parametric modelp(x |θ)


3. Clustering, feature extraction, sample generation...


Models

Unsupervised learning models typically involve hidden or latentvariables.

zn = hidden, or latent, variables for each data point xn

Ex.: zn = cluster index of xn


(a) Directed Generative Models

Model data x as being caused by z :

p(x |θ) =∑z

p(z |θ)p(x |z , θ)


(a) Directed Generative Models

Ex.: Document clusteringI x is a document, and z is (interpreted as) topicI p(z |θ) = distribution of topicsI p(x |z , θ) = distribution of words in document given topic

Basic representatives:I Mixture of GaussiansI Likelihood-free models


(d) Autoencoders

Model encoding from data to hidden variables, as well as decodingfrom hidden variables back to data:

p(z |x , θ) and p(x |z , θ)


(d) Autoencoders

Ex.: CompressionI x is an image and z is (interpreted as) a compressed (e.g., sparse)

representationI p(z |x , θ) = compression of image to representationI p(x |z , θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionarylearning, neural network-based autoencoders



Focus on directed generative models (a)

To simplify the notation, consider a single data point x (sum overdata set D to generalize).

ML problem:

maxθ

ln p(x |θ) = ln

(∑z

p(x , z |θ)

)= ln

(Ez∼p(z|θ)[p(z|x , θ)]

)Key issue: Need to marginalize over latent variables, whosedistribution is to be learned, in order to evaluate LL.


ELBO

To tackle this issue, a standard approach is the introduction of avariational distribution (or encoder) q(z |x) that provides aprobabilistic estimate of z given x.

For any distribution q(z |x), the Evidence Lower BOund (ELBO) or(negative) free energy L(q, θ) is defined as

L(q, θ) =Ez∼q(z|x)[ln p(x , z|θ)− ln q(z|x)︸︷︷︸learning signal

]


ELBO

The ELBO is a global lower bound on the LL function

ln p(x |θ) ≥ L(q, θ),

Equality holds at a value θ0 if and only if the distribution q(z |x) isthe posterior of z given x (i.e., optimal Bayesian estimate)

q(z |x) = p(z |x , θ0).

LL

0


Expectation-Maximization (EM) Algorithm

...

LL

newold



Initialize parameter vector θold.

For each iterationI E step: For fixed parameter vector θold,

maxqL(q, θold)→ qnew(z |x) = p(z |x , θold).

Estimate of the latent variables

I M step: For fixed variational distribution qnew(z |x),

maxθL(qnew, θ)→ max

θEz∼qnew(z|x) [ln p(x , z|θ)]

Solve a supervised learning problem



EM guarantees decreasing objective values, which ensuresconvergence to a local optimum of the original problem.

...

LL

newold


Example: Mixture of Gaussians

Directed generative model:

z ∼ Bern(π)

x|z =k ∼ N(µk ,Σk)


Scaling EM

EM algorithm may be impractical for large scale problems: need tocompute posterior in E step and to average over z in the M step.

Solutions:I E step: Parametrize the variational distribution q(z |x , ϕ) and maximize

ELBO over ϕ (variational autoencoder)I M step: Approximate Ez∼qnew(z|x) [ln p(x , z|θ)] via Monte CarloI Use gradient descent for E and/or M steps


Learning: Beyond Maximum Likelihood

ML tends to provide inclusive and “blurry” estimates of thedistribution of the data distribution.

-5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

This can be a problem for tasks such as data generation.


Learning: Beyond Maximum Likelihood

ML can be proven to minimize the KL divergence

KL(pD(x)||p(x |θ)) = Ez∼pD(x)

[ln

pD(x)

p(x |θ)

]betwen the empirical distribution

pD(x) =N[x ]

N(with counts N[x ] = |{n : xn = x}|)

and the model.


Learning: Beyond Maximum LikelihoodThe KL divergence is part of the larger class of f -divergences betweentwo distributions p(x) and q(x):

Df (p||q) = maxT (x)

Ex∼p[T (x)]− Ex∼q[g(T (x))],

for some concave increasing function g(·).

6:T;

T1L:T;

T1M:T;

L T �� 6 T ��

discriminator

M T �� 6 T ��


Learning: Generative Adversarial Networks (GANs)

Generalizing the ML problem, GANs attempt to solve the problem

minθ

maxϕ

Ex∼pD [Tϕ(x)]− Ex∼p(x |θ)[g(Tϕ(x))]

for some differentiable function Tϕ(x) of the parameter vector ϕ.

Choice of the divergence (via the discriminator) is tailored to data.

Can be applied to likelihood-free models.


Learning: Generative Adversarial Networks (GANs)

84 [NVIDIA]


Applications to Communication Networks



At the Edge: PHYEnd-to-end encoding/decoding for wireless channels – autoencoders

[2¶6KHD�DQG�Hoydis ¶��@






(depends)


At the Edge: PHYEnd-to-end encoding/decoding for optical channels – autoencodersAlgorithm deficit

[Karanov HW�DO�o18]


At the Edge: PHYEnd-to-end encoding/decoding for Gaussian channels with feedback –autoencoders based on Recurrent Neural Network (RNN)Algorithm deficit

>.LP�HW�DO�µ��@


At the Edge: PHYJoint source-channel coding for image transmissionAlgorithm deficit

[Bourtsoulatze et al ’18]


At the Edge: PHYChannel State Information (CSI) compression and feedback –autoencodersModel deficitRelated work: quantization of log-likelihoods [Arvinte et al ’19]

>:HQ�HW�DO�µ��@


At the Edge: PHYFingerprinting for localization – autoencodersModel deficit

[Xiao et al '17]Osvaldo Simeone ML for Comm 122 / 184

At the Edge: PHYMimicking a propagation channel - GAN (see also [Ye et al ’18])Model deficit

[2¶6KHD�HW�DO�µ18]


At the Edge: PHY

Mimicking and identifying a propagation channel (e.g., satellite) -generative models

Leveraging domain knowledge improves the learned model.

104 [Ibnkahla µ��@


At the Edge: MAC/ LinkResource allocation – clusteringAlgorithm deficit

[Abdelnasser et al '14]


At the Cloud: NetworkSelf-organizing multi-hop networks – clusteringAlgorithm deficit

[Abbassi and Younis '07]


At the Cloud: NetworkAnomaly detection – density estimationModel deficit

[Musumeci HW�DO�¶��@


Overview







Reinforcement Learning

Feedback-based sequential decision making

[@ D. Silver]

st at

rt



Data collected while acting

Given an observation of the world (input) st at time t, the agenttakes an action (output) at ...

Ex.: st=current image; at ∈ {up,down,stay}

[Karpathy µ��@



... and receives a reward signal (feedback) rt

Ex.: rt = 1 if ball goes past opponent; rt = −1 if misses ball; rt = 0otherwise

Current actions generally affect future states and rewards

[Karpathy µ��@


Defining Reinforcement Learning

Nç1 L:N�Oá=;

Oç>51 L Oñ Oá =

Oç

Oç>5

=ç

=ç1è: �Oç;

Oç>6

=ç>5 «

«



Goal: optimize policy π(a|s) = Pr[a = a|s = s] by maximizing theexpected return Eπ [Gt |st = s] with

Gt =∞∑τ=0

γτ rt+τ = rt + γrt+1 + γ2rt+2 + ...

Discount factor 0 < γ ≤ 1: preference for immediate reward if γ < 1



Problem defined by two “true” distributions:I Reward distribution given state and action: p(r |s, a)I State transition probability: p(s ′|s, a)



Nç1 L:N�Oá=;

Oç>51 L Oñ Oá =

Oç

Oç>5

=ç

=ç1è: �Oç;

Oç>6

=ç>5 «

«



When the true distributions are known, we have a Markov DecisionProcess (MDP)

When it is not, we have a reinforcement learning problem

Reinforcement learning:I Model-based: Learn a model (p(r |s, a, θ), p(s ′|s, a, θ)) and then solve

as MDPI Model-free: Learn directly the policy, i.e., how to act


Methodology: Generalized Policy Iteration

Solution methods for both MDP and model-free reinforcementlearning generally alternate between two steps:

I Policy evaluation: Estimate the performance of the either the currentpolicy (on-policy learning) or of the current optimal policy (off-policylearning)

I Control: Act and update the policy to increase the expected return


Methodology: Policy Evaluation

Policy evaluation: Action-value functionQπ(s, a) = Eπ [Gt |st = s, at = a]...


Methodology: Policy EvaluationPolicy evaluation: Action-value functionQπ(s, a) = Eπ [Gt |st = s, at = a]...

[Mnih et al '17]Osvaldo Simeone ML for Comm 141 / 184

Methodology: Policy EvaluationPolicy evaluation: ... and/or value functionVπ(s) = Eπ [Gt |st = s] = Eπ [Q(s, a)|st = s]...

Va

lue

(V

)


Methodology: Policy EvaluationSpectrum of solutions for policy evaluation:

I breadth and depth of the exploration of the state-action tree

TD(�)Osvaldo Simeone ML for Comm 143 / 184

Methodology: Policy EvaluationExploration in breadth generally requires the availability of a modelModel-free methods are based on Temporal-Difference (TD) and/orMonte Carlo

TD(λ)

model-based

model-free

(sample-based)


Methodology: Policy Evaluation

Monte Carlo (offline): Q(st , at) ≈ rt + γrt+1 + γ2rt+2 + ...

TD (online): Q(st , at) ≈ rt + γQ(st+1, at+1)

These techniques are on-policy and there are off-policy variants

TD(�)


Methodology: ControlHow to choose actions? Exploration vs exploitation

I Ex: Multi-armed banditI Exploration: choose a random action with probability ε; Gibbs random

policies

N51L:N�= L s; N61L:N�= L t; N71L:N�= L u;


Methodology: Control

How to improve a policy?I greedy: maxaQπ(s, a)I gradient-based: policy gradient


Methodology

Value vs policy-based methods:I value-based: policy evaluation and control via action-value function

Q(s, a)I policy-based: update an explicit policy π(a|s)I actor-critic: policy evaluation via action-value function Q(s, a) and

control via policy π(a|s)


Example: Deep Q LearningValue-based – action-value function Q(s, a) parameterized via aneural networkPolicy evaluation: Off-policy TDControl: ε−greedy

[Mnih et al '17]Osvaldo Simeone ML for Comm 150 / 184

Example: Advantage Actor CriticActor-critic – action-value function Q(s, a) and policy π(a|s)parameterized via neural networksPolicy evaluation: On/Off-policy TDControl: policy gradient

[Karpathy µ��@Osvaldo Simeone ML for Comm 151 / 184

Applications to Communication Networks



At the Edge: PHYEnd-to-end encoding/decoding training in the absence of a channelmodelModel deficit

[Aoudia and Hoydis µ��@


At the Edge: PHYIteratively trains decoder using supervised learning and encoder usingreinforcement learning (policy gradient)Requires receiver-to-transmitter feedback – impact of noise [Goutayet al ’18]

[Aoudia and Hoydis µ��@


At the Edge: PHYLearning to act and to communicate (binary signals on binarysymmetric channels) to carry out collaborative tasksAlgorithm deficitLearned communication carries out compression and unequal errorprotection

[Mostaani and Simeone '18]Osvaldo Simeone ML for Comm 155 / 184

At the Edge: MAC/LinkBeamforming-user assignment in mmWave communicationsModel deficitRequires feedback

[Klautau HW�DO�¶��@Osvaldo Simeone ML for Comm 156 / 184

At the Edge: MAC/LinkAnti-jamming channel and base station selectionModel deficitRequires feedbackSimilar applications in cognitive radio networks [Li et al ’17]

>+DQ�HW�DO�¶��@


At the Edge: MAC/LinkLearning how to allocate random access opportunitiesModel and algorithm deficit

[Jiang et al '18]


At the Edge: MAC/LinkLearning to act and to schedule to carry out collaborative tasksAlgorithm deficit

[Kim et al '19]


At the Edge: Network

Handover and base station selection for access

Model deficit

[Hasan et al ‘13]


At the Edge: Network

On-off scheduling of base stations

Model deficit

[Xu et al '17]

on/off state

& users’ demands

tx, circuitry and

transition power

on/off scheduling

& beamforming


At the Edge: Application

Caching on wireless devices based on channel conditions and contentlifetime

Model deficit

[Somuyiwa et al '18]


At the Cloud: NetworkPacket routing in a dynamically changing networkModel deficit

[Boyan and Littman '94]


Concluding Remarks

Machine learning tools can leverage the availability of data andcomputing resources in modern communication systems.

Supervised, unsupervised and reinforcement learning paradigms lendthemselves to different key communication (sub)tasks.

Not a universal solution – case by case analysis of advantages anddisadvantages.


Concluding Remarks

Engineering the integration of traditional model-based techniques anddata-driven machine learning methods

>5HLFK�µ��@


Concluding Remarks

Additional topics, e.g., adversarial (white-box) attacks

[Sadeghi and Larsson ’19]


Concluding Remarks

Additional topics, e.g., meta-learning (learning the inductive bias fromother tasks)

[Park et al’19]


Concluding Remarks

Additional topics, e.g., neuromorphic computing [Jang et al ’19]

[Rajendran '18]


Concluding Remarks

How to integrate machine learning analytics and control with protocolstack?

[China Mobile]


For More...

O. Simeone, “A Brief Introduction to Machine Learning forEngineers,” Foundations and Trends in Signal Processing, 2018.

O. Simeone, “A Very Brief Introduction to Machine Learning withApplications to Communication Systems,” IEEE Transactions onCognitive Communications and Networking, 2019.


Acknowledgements

This work has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovation

programme (grant agreement No. 725731).


References

[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction toMachine Learning Communications Systems, 2017[Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. ScalingDeep Learning-based Decoding of Polar Codes via Partitioning. arXivpreprint arXiv:1702.06901. 2017 Feb 22.[Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear DigitalSelf-Interference Cancellation for In-Band Full-Duplex Radios Using NeuralNetworks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND.Learning to Optimize: Training Deep Neural Networks for WirelessResource Management. arXiv preprint arXiv:1705.09412. 2017 May 26.[de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone,Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017.[Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learningfor Massive MIMO CSI Feedback arXiv.


References[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulationclassification of MIMO-OFDM signals by independent component analysisand support vector machines.” in Proc. Asilomar 2011.[Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based ondiscriminant-adaptive neural network in IEEE 802.11 environments. IEEETransactions on Neural networks. 2008 Nov;19(11):1973-8.[Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural networkbased spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010.[Chen et al ’17] Chen M, et a;. Echo state networks for proactive cachingin cloud-based radio access networks with mobile users. IEEE Transactionson Wireless Communications. 2017.[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE IndoorLocalization Based on Denoising Autoencoder. IEEE Access.2017;5:12751-60.[Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resourceallocation for dense femtocells in a two-tier cellular OFDMA network.IEEE Transactions on Wireless Communications, 2014.


References

[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques forinternet traffic classification using machine learning. IEEECommunications Surveys & Tutorials. 2008 Oct 1;10(4):56-76.[Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "Asupervised learning approach for routing optimizations in wireless sensornetworks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006.[Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clusteringalgorithms for wireless sensor networks. Computer communications, 2007.[Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in thestochastic block model. IEEE Transactions on Information Theory, 2016.[Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systemsvia Asynchronous Multi-User Deep Reinforcement Learning, arxiv.[Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a MachineLearning Detector Based on k-Nearest Neighbors. IEEE PhotonicsTechnology Letters. 2016.


References[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidthsharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010.[Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality ofmyopic scheduling and whittle indexability for energy harvesting sensors. inProc. CISS 2012.[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based frameworkfor power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC2017.[Mnih et al ’15] Mnih V, et al. Human-level control through deepreinforcement learning. Nature, 2015.[Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques forNext-Generation Context-Aware Wireless Networks, arxiv.[Tang et al ’17] F. Tang et al., "On Removing Routing Protocol fromFuture Wireless Networks: A Real-time Deep Learning Approach forIntelligent Traffic Control," IEEE Wireless Communications, 2018.[Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTEoperation in unlicensed bands," in Proc. IEEE ICC 2015.


References[Kato et al ’17] N. Kato et al., "The Deep Learning Vision forHeterogeneous Network Traffic Control: Proposal, Challenges, and FuturePerspective," IEEE Wireless Communications, June 2017.[Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto,In-network Neural Networks, arXiv:1801.05731.[He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-basedResource Management in Software-Defined and Virtualized Vehicular AdHoc Networks, in Proc. ACM SDAIVN 2017.[Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, NeuralNetwork Detection of Data Sequences in Communication Systems,arXiv:1802.02046[Emigh et al ’15] Emigh, Matthew, et al. "A model based approach toexploration of continuous-state MDPs using Divergence-to-Go." in Proc.IEEE Machine Learning for Signal Processing (MLSP), 2015.[Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, BlindChannel Equalization using Variational Autoencoders, in Proc. IEEE ICCworkshop, 2018.


References

[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, AvishekNag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore,Networking and Internet Architecture A Survey on Application of MachineLearning Techniques in Optical Networks, arXiv:1803.07976[Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-BasedFuture Received Signal Strength Prediction Using Depth Images formmWave CommunicationsarXiv:1804.00709.[Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu,Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC2018.[Aoudia and Hoydis ’18] Model Fayı¿œal Ait Aoudia, Jakob Hoydis,End-to-End Learning of Communications Systems Without a Channel,arXiv:1804.02276.[Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning ofOptical Fiber Communications, arXiv:1804.04097.


References

[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West,“Approximating the Void: Learning Stochastic Channel Models fromObservation with Variational Generative Adversarial Networks”,arXiv:1805.06350.[Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning forNetwork Slicing”, arXiv:1805.06591.[Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digitalcommunications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215.[Bouchired et al ’98] Bouchired S, Roviras D, Castaniı¿œ F. Equalisationof satellite mobile channels with neural network techniques. SpaceCommunications. 1998.[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-basedcontinuous phase modulation receivers. IEEE transactions oncommunications. 1992.


References

[Schibisch et al ‘18] Stefan Schibisch, et al,“Online Label Recovery forDeep Learning-based Communication through Error Correcting Codes,”.arXiv:1807.00747.[Ye et al ’18] Hao Ye, et al, “Channel Agnostic End-to-End Learning basedCommunication Systems with Conditional GAN,” arXiv:1807.00447.[Kim et al ’18] Hyeji Kim, et al “Deepcode: Feedback Codes via DeepLearning,” arXiv:1807.00801.[Shu et al ’18] Shu J., Xu, Z., and Meng, D, “Small Sample Learning inBig Data Era”, arXiv:1808.04572.[Zappone et al ’18] A. Zappone, M. Di Renzo, M. Debbah, T. T. Lam,and X. Qian, “Model-Aided Wireless Artificial Intelligence: EmbeddingExpert Knowledge in Deep Neural Networks Towards Wireless SystemsOptimization" arXiv:1808.01672.


References

[Farsad et al ’18] Farsad N, Rao M, Goldsmith A. Deep Learning for JointSource-Channel Coding of Text. arXiv preprint arXiv:1802.06832. 2018Feb 19.[Bourtsoulatze et al ’18] Bourtsoulatze E, Kurka DB, Gunduz D. DeepJoint Source-Channel Coding for Wireless Image Transmission. arXivpreprint arXiv:1809.01733. 2018 Sep 4.[Luong et al ’18] Luong NC, et a;, Applications of Deep ReinforcementLearning in Communications and Networking: A Survey. arXiv preprintarXiv:1810.07862. 2018 Oct 18.[Aoudia and Hoydis ‘18] FA Aoudia and J. Hoydis, End-to-End Learning ofCommunications Systems Without a Channel Model, arXiv:1804.02276v2,2018.[Goutay et al ’18] Goutay M, Aoudia FA, Hoydis J. Deep ReinforcementLearning Autoencoder with Noisy Feedback. arXiv preprintarXiv:1810.05419. 2018 Oct 12.


References

[Klatau et al ’16] Klautau A, et al. 5G MIMO Data for Machine Learning:Application to Beam-Selection using Deep Learning. InProc. Inf. Theoryand Appl. Workshop (ITA) 2016 Jan.[Han et al ’17] G. Han, L. Xiao, and H. V. Poor, “Two-dimensionalanti-jamming communication based on deep reinforcement learning,” inProceedings of the 42nd IEEE International Conference on Acoustics,Speech and Signal Processing, , 2017.[Li et al ’17] X. Li, J. Fang, W. Cheng, H. Duan, Z. Chen, and H. Li,“Intelligent power control for spectrum sharing in cognitive radios: A deepreinforcement learning approach,” arXiv preprint arXiv:1712.07365 , 2017.[Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codesusing deep learning." Allerton 2016.


References

[Choi et al ’18] Choi K, Tatwawadi K, Weissman T, Ermon S. NECST:Neural Joint Source-Channel Coding. arXiv preprint arXiv:1811.07557.2018 Nov 19.[Jiang et al ’18] Yihan Jiang, Hyeji Kim, Himanshu Asnani, SreeramKannan, Sewoong Oh, Pramod Viswanath, ”LEARN Codes: InventingLow-latency Codes via Recurrent Neural Networks,” arXiv:1811.12707.[Mostaani and Simeone ’18] Arsham Mostaani, Osvaldo Simeone, ”OnLearning How to Communicate Over Noisy Channels for CollaborativeTasks,” arXiv:1810.01155.[Park et al ’19] Jihong Park, Sumudu Samarakoon, Mehdi Bennis,Merouane Debbah, ”Wireless Network Intelligence at the Edge”,arXiv:1812.02858.[Kim et al ’19] Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, YiY. Learning to Schedule Communication in Multi-agent ReinforcementLearning, ICLR 2019.


References

[Boyan and Littman ’94] Boyan JA, Littman ML. Packet routing indynamically changing networks: A reinforcement learning approach.InAdvances in neural information processing systems 1994 (pp. 671-678).[Somuyiwa et al ’18] Somuyiwa SO, Gyorgy A, Gunduz D. AReinforcement-Learning Approach to Proactive Caching in WirelessNetworks. IEEE Journal on Selected Areas in Communications. 2018 Jun7.[Sadeghi and Larsson ’19] Meysam Sadeghi and Erik G. Larsson, PhysicalAdversarial Attacks Against End-to-End Autoencoder CommunicationSystems, arxiv, 2019.[Huang et al ’19] Chongwen Huang, et al, “Deep Learning for UL/DLChannel Calibration in Generic Massive MIMO Systems”, ICC WCCSymposium 2019.[Arvinte et al ’19] Marius Arvinte, et al, Deep Log-Likelihood RatioQuantization, arXiv:1903.04656


References

[Park et al ’19] Sangwoo Park, Hyeryung Jang, Osvaldo Simeone,Joonhyuk Kang, Learning How to Demodulate from Few Pilots viaMeta-Learning, arXiv:1903.02184.[Jang et al ’19] H Jang, O Simeone, B Gardner, A Gruning, Spiking NeuralNetworks: A Stochastic Signal Processing Perspective, arXiv:1812.03929.[Jiang et al ’18] Jiang N, Deng Y, Simeone O, Nallanathan A. CooperativeDeep Reinforcement Learning for Multiple Groups NB-IoT NetworksOptimization. arXiv preprint arXiv:1810.11729. 2018 Oct 27.


when can machine learning be useful for communication … · goals and learning outcomes goals: i...

Documents