presenter: wayne hsiao advisor: frank , yeong -sung lin

49
Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the Markov Decision Process Approach Presenter: Wayne Hsiao Advisor: Frank, Yeong-Sung Lin Yongle Wu, Beibei Wang, and K. J. Ray Liu

Upload: topper

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the Markov Decision Process Approach. Yongle Wu, Beibei Wang, and K. J. Ray Liu . Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin . Agenda. Introduction Related Works System Model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Defense Against Jamming Attacks in

Cognitive Radio Networks Using the Markov

Decision Process Approach

Presenter: Wayne HsiaoAdvisor: Frank, Yeong-Sung Lin

Yongle Wu, Beibei Wang, and K. J. Ray Liu

Page 2: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 3: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 4: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Cognitive radio technology has been

receiving a growing attention• In a cognitive radio network

o Unlicensed users (secondary users)o Spectrum holders (primary users)

• Secondary users usually compete for limited spectrum resources o Game theory has been widely applied as a flexible and

proper tool to model and analyze their behavior in the network

Page 5: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Cognitive radio networks are vulnerable

to malicious attacks• Security countermeasures

o Crucial to the successful deployment of cognitive radio networks

• We mainly focus on the jamming attacko One of the major threats to cognitive radio networkso Several malicious attackers intend to interrupt the

communications of secondary users by injecting interference

Page 6: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Secondary user could hop across

multiple bands in order to reduce the probability of being jammedo Optimal defense strategy o Markov decision process (MDP)

• The optimal strategy strikes a balance between the cost associated with hopping and the damage caused by attackers

Page 7: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• In order to determine the optimal

strategy, the secondary user needs to know some information o the number of attackers

• Maximum Likelihood Estimation (MLE) o A learning process in this paper that the secondary user

estimates the useful parameters based on past observations

Page 8: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 9: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Related Works• The problem becomes more

complicated in a cognitive radio network o Primary users’ access has to be taken into consideration

• We consider the scenario o A single-radio secondary usero Defense strategy is to hop across different bands

Page 10: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 11: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• A secondary user opportunistically

accesses one of the predefined M licensed bands

• Each licensed band is time-slotted • The access pattern of primary users can

be characterized by an ON-OFF model

Page 12: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model

• Assume all bands share the same channel model and parameters

• But different bands are used by independent primary users

Page 13: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Secondary user has to detect the

presence of the primary user at the beginning of each time slot

Page 14: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Communication gain R

o When the primary user is absent in that band• The cost associated with hopping is C• We assume there are m (m ≥ 1)

malicious single-radio attackers• Attackers do not want to interfere with

primary users o Because primary users’ usage of spectrum is enforced

by their ownership of bands

Page 15: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• On finding the secondary user

o Attacker will immediately inject jamming power which makes the secondary user fail to decode data packets

• We assume that the secondary user suffers from a significant loss L when jammed

• When all the attackers coordinate to maximize the damageo they detect m channels in a time slot

Page 16: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• The longer the secondary user stays in

a band, the higher risk to be exposed to attackers

• At the end of each time slot the secondary user decides o to stay o to hop

• The secondary user receives an immediate payoff U(n) in the nth time slot

Page 17: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model

• 1(.) is an indicator functiono Returning 1 when the statement in the parenthesis holds

true o 0 otherwise

Page 18: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Average Payoff Ū

o The secondary user wants to maximize o Malicious attackers want to minimize

• The discount factor δ (0 < δ < 1) measures how much the secondary user values a future payoff over the current one

Page 19: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 20: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• Attack strategyo Attackers coordinately tune their radios randomly to m

undetected bands in each time slot o When either all bands have been sensed or the

secondary user has been found and jammed

• The jamming game can be reduced to a Markov decision process o We first show how to model the scenario as an MDP o Then solve it using standard approaches

Page 21: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• At the end of the nth time slot o The secondary user observes the state of the current

time slot S(n) o And chooses an action a(n)

• Whether to tune the radio to a new band or not, which takes effect at the beginning of the next time slot

• S(n) = P o The primary user occupied the band in the nth time slot

• S(n) = J o The secondary user was jammed in the nth time slot

Page 22: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• a(n) = h o The secondary user to hop to a new band

• The secondary user has transmitted a packet successfully in the time slot o ‘to hop’ (a(n) = h) o ‘to stay’ (a(n) = s)

• S(n) = K o This is the Kth consecutive slot with successful

transmission in the same band

Page 23: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• The immediate payoff depends on both the state and the action

• p(S’|S, h) o The transition probability from an old state S to a new

state S’ when taking the action h • p(S’|S, s)

o The transition probability from an old state S to a new state S’ when taking the action s

Page 24: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• If the secondary user hops to a new band, transition probabilities do not depend on the old state

• The only possible new states are o P (the new band is occupied by the primary user) o J (transmission in the new band is detected by an

attacker) o 1 (successful transmission begins in the new band)

Page 25: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• When the total number of bands M is large o M ≫ 1

• Assume that the probability of primary user’s presence in the new band equal the steady-state probability of the ON-OFF model o Neglecting the case that the secondary user hops back

to some band in very short time,

Page 26: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• The secondary user will be jammed with the probability m/M o Each attacker detects one band without overlapping

• Transition probabilities are

Page 27: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• Note that s is not a feasible action when the state is in J or P

• At state K, only max(M−Km,0) bands have not been detected by attackers o But another m bands will be detected in the upcoming

time slot o The probability of jamming conditioned on the absence

of primary user

Page 28: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• To sum up, transition probabilities associated with the action s are as follows: ∀K ∈ {1,2,3,...}

Page 29: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 30: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• If the secondary user stays in the same band for too long, he/she will eventually be found by an attacker o p(K + 1|K,s) = 0 if K > M/m − 1

• Therefore, we can limit the state S to a finite set , where

Page 31: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• An MDP consists of four important components o a finite set of states o a finite set of actions o transition probabilities o immediate payoffs

• The optimal defense strategy can be obtained by solving the MDP

Page 32: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• A policy is defined as a mapping from a state to an action o π : S(n) → a(n)

• A policy π specifies an action π(S) to take whenever the user is in state S

• Among all possible policies, the optimal policy is the one that maximizes the expected discounted payoff

Page 33: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• The value of a state S is defined as the highest expected payoff given the MDP starts from state S

• The optimal policy is the optimal defense strategy that the secondary user should adopt since it maximizes the expected payoff

Page 34: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• After a first move the remaining part of an optimal policy should still be optimal

• The first move should maximize the sum of immediate payoff and expected payoff conditioned on the current actiono Bellman equation

Page 35: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• Critical state K*(K∗ ≤ )

• K∗ can be obtained from solving the MDP, and the optimal strategy becomes

Page 36: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 37: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• A learning schemeo Maximum Likelihood Estimation (MLE)

• The secondary user simply sets a value as an initial guess of the optimal critical state K∗

• And follows the strategy (10) with the estimate during the whole learning period

Page 38: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• This guess needs not to be accurate• After the learning period, the secondary

user updates the critical state K∗ accordingly.

• Fo The total number of transitions from S to S’ with the

action h taken• T• T• t

Page 39: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• The likelihood that such a sequence has occurred o A product over all feasible transition tuples o (S,a,S’) ∈ {P,J,1,2,3,...,KL + 1}×{s,h}×{P,J,1,2,3,...,KL

+1}

• Define • The following proposition gives the MLE

of the parameters β, γ, and ρ

Page 40: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• Proposition1: Given , S ∈ and , S ∈ counted from history of transitions, the MLE of primary users’ parameters are

Page 41: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• The MLE of attackers’ parameters ρML is the unique root within an interval (0, 1/(KL + 1)) of the following (KL + 1) order polynomial

• Proof

Page 42: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• With transition probabilities specified in (4) – (7)

• The likelihood of observed transitions (11) can be decoupled into a product of three terms Λ = ΛβΛγΛρ

Page 43: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• By differentiating ln Λβ, ln Λγ, lnΛρ and equating them to 0o Obtain the MLE (12) (13) and (14)

• To ensure that the likelihood is positive, ρ has to lie in the interval (0, 1/(K + 1)) o The left-hand side of equation (14) decreases

monotonically and approaches positive infinity as ρ goes to 0

o The right-hand side increases monotonically and approaches positive infinity as ρ goes to 1/(KL + 1)

Page 44: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• After the learning period, the secondary user rounds M ·ρML to the nearest integer as an estimation of m

• Calculate the optimal strategy using the MDP approach described in the previous section

Page 45: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 46: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result• Communication gain R = 5 • Hopping cost C = 1• Total number of bands M = 60• Discount factor δ = 0.95• Primary users’ access pattern

o β = 0.01, γ = 0.1

Page 47: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result

• When the threat from attackers are more stronger the secondary user should proactively hop more frequently o To avoid being jammed

Page 48: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result

o Always hopping: the secondary user will hop every time slot

o Staying whenever possible: the secondary user will always stay in the band unless the primary user reclaims the band or the band is jammed by attackers.

Page 49: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin