anonymity and covert channels in mix-firewalls

ANONYMITY AND COVERT CHANNELS IN MIX-FIREWALLS

By

VIPAN REDDY R. NALLA

A THESIS PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCE

UNIVERSITY OF FLORIDA

2004

Copyright 2004

by

Vipan Reddy R. Nalla

ACKNOWLEDGMENTS

I would like to gratefully acknowledge the great supervision of Dr. Richard

Newman during this work. I thank Dr. Joseph Wilson and Dr. Shigang Chen for

serving on my committee and for reviewing my work.

I would like to thank Ira Moskowitz and Naval Research Labs for funding me

through research grants. I am grateful to all my friends who helped me directly or

indirectly in preparing this work. Finally, I am forever indebted to my parents for

helping me to reach this stage in my life.

iii

TABLE OF CONTENTSpage

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 MIXES AND MIX NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Mix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Types of Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2.1 Simple Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2.2 Pool Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Mix Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.1 Design Issues in Mix Networks . . . . . . . . . . . . . . . . . . . 62.3.2 Classification of Mix Networks . . . . . . . . . . . . . . . . . . . 9

2.4 Real-time Mix Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.1 Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4.2 Onion Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.3 Babel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4.4 MixMaster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.5 Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.6 PipeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.7 Stop-And-Go Mixes . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.8 Tarzan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 ADVERSARY MODELS AND ATTACKS ON MIXES . . . . . . . . . . . . . 16

3.1 Adversary Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.1 Internal and External Adversary . . . . . . . . . . . . . . . . . . 163.1.2 Active and Passive Adversary . . . . . . . . . . . . . . . . . . . 163.1.3 Local, Restricted and Global Adversary . . . . . . . . . . . . . . 163.1.4 Static and Adaptive Adversary . . . . . . . . . . . . . . . . . . . 17

3.2 Attacks on Mixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.1 Active Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.2 Passive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 ANONYMITY METRICS AND ANALYSIS TECHNIQUE . . . . . . . . . . . 23

4.1 Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Anonymity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Anonymity Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 24iv

4.2.2 Problems with Anonymity Set Size . . . . . . . . . . . . . . . . . 244.2.3 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2.4 Route Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.5 Covert Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.6 Covert Channels in Mix Networks . . . . . . . . . . . . . . . . . 304.2.7 Covert Channel Capacity as Anonymity Metric . . . . . . . . . . 31

4.3 Analysis Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Channel Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 PREVIOUS WORK AND THE EXIT-MIX MODEL . . . . . . . . . . . . . . 37

5.1 Capacity Analysis for Indistinguishable Receivers Case . . . . . . . . . 375.1.1 Case 0: Alice Alone . . . . . . . . . . . . . . . . . . . . . . . . . 375.1.2 Case 1: Alice and One Additional Clueless Transmitter . . . . . 385.1.3 Case 2: Alice and N Additional Transmitters . . . . . . . . . . . 41

5.2 Exit-Mix Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2.1 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.2.2 Channel Matrix Probabilities . . . . . . . . . . . . . . . . . . . . 44

5.3 Capacity Analysis for Exit-MIX Scenario . . . . . . . . . . . . . . . . . 455.3.1 One Receiver (M = 1) . . . . . . . . . . . . . . . . . . . . . . . . 455.3.2 Some Special Cases for Two Receivers (M = 2) . . . . . . . . . 465.3.3 Some Special Cases for Three Receivers (M = 3) . . . . . . . . 515.3.4 Some Generalized Cases of N and M . . . . . . . . . . . . . . . 565.3.5 Non-Uniform Message Distributions . . . . . . . . . . . . . . . . 63

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 DISCUSSION OF RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1 Capacity vs. Clueless Transmitters . . . . . . . . . . . . . . . . . . . . 656.2 Capacity vs. Number of Receivers . . . . . . . . . . . . . . . . . . . . . 656.3 Capacity vs. Mutual Information at x0 = 1/(M + 1) . . . . . . . . . . 696.4 Capacity vs. Message Distributions . . . . . . . . . . . . . . . . . . . . 716.5 Comments and Generalizations . . . . . . . . . . . . . . . . . . . . . . 726.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . 76

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

v

LIST OF FIGURESFigure page

4–1 Vulnerability of Anonymity Sets . . . . . . . . . . . . . . . . . . . . . . . . 26

4–2 Restricted Passive Adversary Model . . . . . . . . . . . . . . . . . . . . . . 32

4–3 Global Passive Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . 33

5–1 Channel Model for Subsection 5.1.1. A) Channel block diagram. B) Chan-nel transition diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5–2 Plot of Covert Channel Capacity as a Function of p . . . . . . . . . . . . . 40

5–3 Channel for Case 3, the general case of N clueless users. A) Channel tran-sition diagram. B) Channel Matrix . . . . . . . . . . . . . . . . . . . . . 42

5–4 Exit Mix-firewall Model with N Clueless Senders and M DistinguishableReceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5–5 Case 4: System with N = 1 Clueless Sender and M = 2 Receivers . . . . . 48

5–6 Capacity for N = 1 Clueless Sender and M = 2 Receivers . . . . . . . . . . 49

5–7 Case 5: System with N = 2 Clueless Senders and M = 2 Receivers . . . . . 50

5–8 Capacity for N = 2 clueless senders and M = 2 receivers . . . . . . . . . . 52

5–9 Case 6: System with N = 1 Clueless Senders and M = 3 Receivers . . . . . 52

5–10 Capacity for N = 1 clueless sender and M = 3 receivers . . . . . . . . . . 53

5–11 Capacity for N = 2 clueless senders and M = 3 receivers . . . . . . . . . . 55

5–12 Case 7: System With N = 2 Clueless Senders and M = 3 Receivers . . . . 56

5–13 Case 8: System with N = 1 Clueless Sender and M Receivers . . . . . . . . 56

5–14 Case 9: System with N Clueless Senders and M = 2 Receivers . . . . . . . 59

6–1 Capacity for N = 1 to 4 Clueless Senders and M = 2 Receivers . . . . . . . 66

6–2 Capacity for N = 1, 2, 4 Clueless Senders and M = 3 Receivers . . . . . . 66

6–3 Mutual Information vs. x0 for N = 1 Clueless Sender and M = 2 Re-ceivers, for p = 0.25, 0.33, 0.5, 0.67 . . . . . . . . . . . . . . . . . . . . . . 67

6–4 Mutual Information vs. p for N = 2 Clueless Senders and M = 2 Receivers 67

6–5 Mutual Information vs. p for N = 2 Clueless Senders and M = 3 Receivers 68

vi

6–6 Value of x0 that Maximizes Mutual Information for N = 1, 2, 3, 4 CluelessSenders and M = 3 Receivers as a Function of p . . . . . . . . . . . . . . 69

6–7 Normalized Mutual Information when x0 = 1/4 for N = 1, 2, 3, 4 CluelessSenders and M = 3 Receivers . . . . . . . . . . . . . . . . . . . . . . . . 70

6–8 Capacity for N = 1 Clueless Sender and M = 1 to 5 Receivers . . . . . . . 70

6–9 Capacity for N = 0 to 9 Clueless Senders and M = 1 to 10. . . . . . . . . . 71

6–10 Capacity for Uniform, Zipf, and 80/20 Distributions for Clueless Trans-mitter and Uniform Distribution for Clueless Transmitter . . . . . . . . . 72

6–11 Capacity for Uniform, Zipf, and 80/20 Distributions for Alice and Uni-form Distribution for Clueless Transmitter . . . . . . . . . . . . . . . . . 73

6–12 Capacity for Uniform, Zipf, and 80/20 distributions for Alice and ZipfDistribution for Clueless Transmitter . . . . . . . . . . . . . . . . . . . . 73

vii

Abstract of Thesis Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of the

Requirements for the Degree of Master of Science

ANONYMITY AND COVERT CHANNELS IN MIX-FIREWALLS

By

Vipan Reddy R. Nalla

December 2004

Chair: Richard E. NewmanMajor Department: Computer and Information Science and Engineering

Privacy is becoming a critical issue on the Internet. Some people want to keep

their purchases private. They do not want to have third parties (or even merchants)

know their identity. This concern may arise because the customer is buying a good of

questionable social value (e.g., pornography); or because the customer does not want to

have his name added to a marketing or mailing list; or for illegal reasons (e.g., to evade

taxes); or simply because the customer personally values privacy.

Mix networks are the most promising approach to anonymize communication in

the Internet. Originally designed to anonymize e-mail communication, variations of the

basic design have led to systems that provide anonymity for low-latency applications

such as web browsing.

Traditional methods for evaluating the amount of anonymity afforded by various

mix configurations have depended on either measuring the size of the set of possible

senders of a particular message (the anonymity set size), or by measuring the entropy

associated with the probability distribution of the messages of possible senders. Our

study further explores an alternative way of assessing the anonymity of a mix system

by considering the capacity of a covert channel from a sender behind the mix to an

observer of the mix’s output.

viii

CHAPTER 1INTRODUCTION

Privacy is becoming a critical issue on the Internet. Some people want to keep

their purchases private. They do not want to have third parties (or even merchants)

know their identity. This concern may arise because the customer is buying a good of

questionable social value (e.g., pornography); or because the customer does not want to

have his name added to a marketing or mailing list; or for illegal reasons (e.g., to evade

taxes); or simply because the customer personally values privacy. Elections constantly

remind us that one of the most important barriers to electronic voting is users’ fear of

having their privacy violated. Unfortunately, this is justified, as marketers and national

security agencies have been very aggressive in monitoring user activity.

Mix networks [3] are the most promising approach to anonymize communication in

the Internet. Originally designed to anonymize e-mail communication, variations of the

basic design have led to systems that provide anonymity for low-latency applications

such as web browsing. All these anonymity networks were not designed with covert

channel threat in mind. The goal of this work is to show that even in what appears to

be a benign form of communication, information may still leak out of the network.

Overview.Our study addressed anonymity and covert channels. The major con-

tribution of our study is identification, analysis, and capacity estimation of, the covert

channels that arise from the use of a Mix [3, 21] as an exit firewall.

Mixes are special nodes in a network that relay messages while hiding the cor-

respondence between their input and their output. A careful explanation of mixes

and a detailed classification of mixes is presented in chapter 2. Several mixes can be

chained to relay a message anonymously. These systems provide the best compromise

between security and efficiency in terms of bandwidth, latency, and overheads. Design

issues related to mix networks are also presented along with examples of some real-time

1

2

mix-based anonymizing systems. Chapter 3 presents various adversary models, followed

by a comprehensive listing of attacks against mixes and mix networks.

Anonymity is an important issue in electronic payments, electronic auctions,

electronic voting, and also for email and web browsing. A communication can never be

truly anonymous, but relative anonymity can be achieved. Chapter 4 defines anonymity

and presents various types anonymity. It also describes generalized methods to measure

anonymity and the technique used for analysis. We measured the lack of perfect

anonymity via a covert channel. Covert channel analysis includes finding security flaw,

development of covert channel scenarios and its capacity analysis. Chapter 4 gives a

brief description of a particular flavor of covert channels arising in mix networks.

Chapter 5 presents adversary model with details of terminology and model setup.

It also presents initial work involving a simple model [13] with a restricted adversary

(RPA), along with results and conclusions. It also presents the main analysis done in

the thesis. This includes analyzing the capacity of the covert channels for different cases

of sends and receivers. A detailed discussion of results of this analysis form the Chapter

6. Chapter 7 presents conclusions and suggests future work, needed in this area.

CHAPTER 2MIXES AND MIX NETWORKS

2.1 Mix

David Chaum first introduced mix networks for untraceable electronic mail [3].

A mix server randomly permutes and decrypts input messages. The Key property of

the mix network is that we can’t tell which ciphertext corresponds to a given message.

Chaum’s original system used a very simple threshold mix model, but since then many

different types of mixes have been proposed in literature, and some of them are being

used in practice.

A mix server is classified by the batching strategy used. The batching strategy

involves collecting messages, mixing them well, and flushing the messages when certain

conditions are met. The flushing algorithm used in the mix can be expressed as a

function P : N → (0, 1) from the number of messages inside the mix to the fraction of

messages to be flushed. The flushing condition is expressed in terms of time interval t,

threshold of messages n collected in the mix, or a combination of both.

2.2 Types of Mixes

Based on the flushing algorithm used, mixes can be divided into simple mixes and

pool mixes.

2.2.1 Simple Mixes

A simple mix flushes all the messages it contain, when the flushing conditions are

met. Hence, the value of the function P (n) is equal to one. These mixes can be further

classified, depending on the flushing condition used.

Threshold mix.

• Flushing Condition Parameters: threshold on messages collected in the mix, n.• Flushing Algorithm: the mix fires all the messages when n messages are collected.• Message delay: The minimum delay is ε (this happens when mix already con-

tained n-1 messages before the target message arrives). The maximum delaycan be infinite, if no more messages arrive after the target message. Assuming amessage arrival rate r, the average message delay is given by n

2r.

3

4

• Anonymity: Assuming all the messages in the mix are from different senders andgo to different receivers, the probability that an outgoing message corresponds toa particular incoming message is given by 1

n. This probability always equal to 1

n,

since the threshold n is constant.

Timed mix.

• Flushing Condition Parameters: time interval, t.• Flushing Algorithm: The mix flushes (all the messages in the mix) every t time

units (generally seconds).• Message delay: The minimum delay is ε, when the target message arrives just

before the flushing time period of the mix. The maximum delay is t − ε, whenthe target message arrives just after the mix has fired. Hence, the mean delay is t

2

units.• Anonymity: The anonymity of the mix depends on the number of messages

arriving in a particular flushing interval. The minimum anonymity is zero, whenno message arrives in the time interval. The maximum anonymity is theoreticallyinfinite, but is limited to the number of messages the mix can hold. Assuming amessage arrival rate of r, a total of rt messages are fired. So the probability of anoutgoing message corresponds to a particular incoming message is given by 1

rt.

Threshold or timed mix.

• Flushing Condition Parameters: time interval, t; threshold on messages, n.• Flushing Algorithm: The mix flushes (all the messages in the mix) every t time

units (generally seconds) or when n messages accumulate in the mix.• Message delay: The minimum delay is ε, when the target message arrives just

before the flushing time period or when the mix already has n-1 messages. Themaximum delay is t − ε, when the target message arrives just after the mix hasfired and number of messages arrived in the next interval is less than n.

• Anonymity: The anonymity of the mix depends on the number of messagesarriving in a particular flushing interval. The minimum anonymity is zero, whenno message arrives in the time interval. The maximum anonymity is not infiniteas in the previous case because of the threshold n. The minimum probability ofan outgoing message corresponds to a particular incoming message is given by 1

n.

Threshold and timed mix.

• Flushing Condition Parameters: time interval, t; threshold on messages, n.• Flushing Algorithm: The mix flushes (all the messages in the mix) every t time

units (generally seconds) but only when at least n messages have accumulated inthe mix.

• Message delay: The minimum delay is ε, when the target message arrives justbefore the flushing time period. The maximum delay can be infinite, if number ofmessages accumulated is less than n.

• Anonymity: The minimum anonymity for this mix is no more zero, since themix doesn’t fire until it has n messages. The maximum anonymity is in theoryinfinite, but is limited in practice by the number of messages the mix can hold.The maximum probability of an outgoing message corresponds to a particularincoming message is given by 1

n.

5

2.2.2 Pool Mixes

In pool mixes, the mix retains some messages and hence the value of the flushing

function P (n) is less then one. Pool mixes can be further divided into constant and

dynamic pool mixes, depending on whether the value of function P is constant over

successive flushes by the mix.

Constant pool mixes. The simple mixes described earlier can be modified to retain

a constant pool of messages for the next round.

Threshold pool mix.

• Flushing Condition Parameters: number of messages retained (pool), f ; thresholdon messages, n.

• Flushing Algorithm: The mix fires n messages when it accumulates n + fmessages. The pool of messages to be retained (f) are uniformly chosen atrandom from the n + f messages collected in the mix.

• Message delay: The minimum delay is ε and the maximum delay is theoreticallyinfinite. Serjantov, Syverson and Dingledine[20] analyze the threshold pool mixesin detail. They calculate the mean delay by taking into account the fact that amessage can be retained in the mix for arbitrary long time. The probability of amessage being retained is a particular round is given by f

n+f. The mean delay is 1

+ ( fn+f

) rounds. If the message arrives at a rate or r messages per time unit, the

average delay is (1 + fn+f

)nr.

• Anonymity: The anonymity of the message going through a pool mix depends onthe entire history of events that happened in the mix. The minimum anonymityof the mix is at least equal to the simple threshold mix. Serjantov and Newman[20] carried out the analysis and have calculated the maximum anonymity interms of number of possible sets.

Amax = −(1− f

n) log(n + f) +

f

nlog(f)

Timed pool mix.

• Flushing Condition Parameters: number of messages retained (pool), f ; timeinterval, t.

• Flushing Algorithm: The mix fires every t time units. A pool of f messageschosen uniformly at random is retained in the mix. If there the number ofmessages accumulated is less than of equal to f , then the mix doesn’t fire.

• Message delay: The minimum delay is ε and the maximum delay is infinite (whenno message arrives for a long time, the messages retained in the pool never leavethe mix). Like in the threshold pool mix, there is a non-zero probability that amessage is retained for arbitrarily long time.

6

Dynamic pool mixes. Dynamic pool mixes are represented by the function P and

this function can be modified to maximize the anonymity obtained. Cottrell mix [5] and

Binomial mix [20] are some examples of dynamic pool mixes.

Timed dynamic pool mix (Cottrell mix).

• Flushing Condition Parameters: number of messages retained(pool), f ; timeinterval, t; α, fraction of messages to be sent; threshold, n.

• Flushing Algorithm: The mix fires every t time units, provided there are at leastn + f messages in the mix; However, instead of firing n messages, it fires max(1,bm ∗ αc) messages, where m + f is the number of messages in the mix (m ≥ n).

• Message delay: Like the timed pool mix, the minimum delay is ε. The maximumdelay is at least as high as that of timed constant pool mix. The average delaydepends on the future rate of arrival of the messages.

• Anonymity: The anonymity provided by this mix is higher than the constantpool mixes. This is because as the the number of messages collected goes up,the α keeps the chance of message remaining in the dynamic pool mix constant.For a constant timed pool mix, this quantity decreases with increase in messagescollected and in case of threshold pool mix, the mix has to flush frequently, hencereducing the chance of a message remaining in the mix per unit time.

Binomial mix.

• Flushing Condition Parameters: time interval, t; threshold, n.• Flushing Algorithm: We can imagine the flushing function P (n) as a probability.

For all the messages collected, the mix tosses a coin. A head indicates that themessage will be sent and a tail indicates it will remain in the mix. On an average,the number of messages sent, s = nP (n). s follows the well known binomialdistribution with a variance equal to np(1 − p), where is p is the result of thefunction P (n).

• Message delay: The minimum delay is ε and maximum delay depends on therandom binomial function P (n).

• Anonymity: The anonymity provided by the mix is much more than that ofpreviously discussed mix types. this is because the attacker can’t easily determinethe number of messages in the mix, n by observing the value of s.

2.3 Mix Networks

The chain of mixes from a client to a server is called anonymous tunnel or a

mix network. A single encrypted connection is used to transport the data of multiple

anonymous tunnels between two mixes.

2.3.1 Design Issues in Mix Networks

A Mix Network is characterized by the type of anonymity provided, packet sizes,

dummy traffic, routing, and the node-flushing Algorithm used at individual nodes. We

will discuss each of these issues briefly.

7

Anonymity. Probably the most important design issue is that of anonymity versus

pseudonymity. Pseudonymity mean that some node(s) knows the users pseudonym (it

can’t link a pseudonym with a real-world identity). Another option is to have the user

be anonymous in the mix network but be pseudonymous in its dealings with other users

(half-pseudonymity).

Anonymity provides better security since if a pseudonym (nym) is linked with

a user, all future uses of the nym can be linked to the user. But, pseudonymity has

many other advantages when compared to complete anonymity. Pseudonymity provides

the best of both worlds: privacy protection and accountability (and openness). Since

pseudonyms (nyms) have a persistent nature, long term relationships and trust can be

cultivated. Authentication (verifying that someone has the right to use the network) is

easier with pseudonymity because Chaumian blinding [4] needs to be used when using

anonymity.

Packet sizes. The messages (e.g. web requests/replies) are chopped in fixed-length

packets and are delivered in a particular order (lexicographic etc.). This eliminates

the traffic analysis at a mix based on the packet length. But in many situations, using

different message sizes yield substantial performance improvements. For example

TCP/IP connections require on average one small control packet for every two (large)

data packets. It might be inefficient for small messages to be5 padded or large packets

split up in order to get a message of the correct size. So, we have a tradeoff between

security and performance: using more than one message size gives better performance

but worse security.

Dummy traffic. Dummy packets are normally introduced to reduce traffic pattern

based attacks and to some extent other passive attacks discussed in 3.2.2. Dummy

messages contain random bit strings and are indistinguishable from real packets.

Messages can be introduced between two mixes between client and the first mix in a

tunnel, between the client and the last mix in the tunnel, or end-to-end dummies. This

results in constant, bi-directional packet streams between any two mix-nodes or the

users and their entry node.

8

Dummy traffic is often used in an unstructured manner in to the mix-networks and

might not be as effective as it could be, some studies [15, 16, 18, 26, 27] have discussed

and analyzed the use of dummy traffic for traffic analysis prevention.

If a mix node sends its message to less than t nodes, dummy messages should be

sent in such a way that t nodes receive messages. The larger t, the harder it is to mount

the brute search attacks and intersection attacks.

Each mix node should send messages to at least t destinations outside the mix

network (dummy messages should be used to fill the gaps). The larger t, the harder it is

to mount the brute search attack. Furthermore, this technique also complicates attacks

in which the adversary monitors the exit nodes.

Dummy messages can also be used to randomize the users communication patterns

by making the user to send dummy traffic to the entry node. The challenge here is to

have good security and minimize the amount of dummy messages used.

Finally, dummy messages could also be used to reduce the amount of time mes-

sages stay at a given node. It seems that waiting for s messages to enter a mix node

before sending t (t > s ) has similar security properties as waiting to receive t messages

before releasing them. This trick could be used to reduce the time messages wait at

nodes [18].

Routing. Routing can be either static, in which a preassigned number routes are

used, or dynamic, where the user chooses the nodes in his route randomly. For large

Internet based systems especially, having the user choose the nodes in his route is a

viable option because of the following reasons.

• The nodes and users must “know” each other node, which might be impractical.• Some servers are far from each other and it doesn’t make sense from a perfor-

mance view point to have, for example, a route consisting of nodes in Australia,Canada, South Africa and China.

• Nodes should be “socially” independent. Ideally, the nodes in a route shouldbelong to different organizations and be located in different legal jurisdictions.The whole idea behind using more than one node is that none of them haveenough information to determine sender-recipient matchings. Hence, if all nodesin a route belong to the same organization we might as well just use a singlenode. The motivation for having nodes in different legal jurisdiction is that morethan one subpoena needs to be obtained to compromise nodes legally.

9

Normally, systems use static routes that allow mix nodes to associate each message

with a connection identifier, which helps reducing the number of public key operations

executed. But on the negative side, it is more susceptible to attacks because having

fixed routes makes some of the attacks a lot easier to be carried out.

Creating good network topologies and route finding algorithms with respect to

security and efficiency is not a trivial task and needs lot of analysis on designer’s part.

Node-Flushing Algorithm. As seen in Section 2.2, there are many different ap-

proaches to flushing nodes. Again, there is a security/practicality tradeoff: the longer

messages can stay in mix-nodes the better the security (in most settings).

more users (in the same anonymity set. The mix servers in any anonymous tunnel

are not known to the adversary,

in a particular order (lexicographic etc..)

used to encrypt the mix-network-internal protocol headers between two adjacent

mix servers. This defeats traffic on the pattern of packets.

they are forwarded. This beats traffic analysis by looking at the sequence of

incoming and outgoing packets

strings and - for an observer - are indistinguishable from real packets. Messages

can be introduced either between client and first mix in the tunnel or end-to-end

dummies between the client and the last mix in the constant, bi-directional packet

streams between any two mixes or the clients and their first mix length of messages is

no longer possible.

2.3.2 Classification of Mix Networks

We can classify mix networks based on the number of servers as static mix-

networks and dynamic mix-networks. Static mix-networks are made up of a relatively

small number of highly available, powerful mixes with good network connectivity that

serve a much larger number of users (e.g. 100 mixes, 100,000 users). These networks

can either be operated commercially or by volunteers. Dynamic mix-networks are

peer-to-peer based networks and every client is also a mix server.

The dynamic mix networks have several advantages compared to static mix-

networks. In theory,there are no limits in the number of users it can support, and

10

since it is a peer-to-peer system, the barrier to join is low. Entry points (connections

between client and first mix) are no longer visible, which makes end-to-end traffic

analysis attacks more difficult to mount. With these advantages come new difficulties.

Dynamic means nodes can join and leave at any time, so the anonymous tunnels are

less stable and may need to be established frequently. Discovering a node is a problem

and some nodes (using dialup) offer poor service, which degrades the quality of service

of a tunnel.

attacker) becomes expensive.

We can also classify the mix network into two types based on the cryptographic

alternative used: Decryption Mix Nets [3] and Re-encryption Mix Nets. Decryption

Mix Nets take cipher texts as input and decrypt them to get back the plain text at the

end-node. Re-encryption Mix Nets use El Gamal cryptosystem’s Malleability property

for re-encryption. So the cipher text is re-encrypted to obtain the original text.

2.4 Real-time Mix Networks

On the practical side, several systems have been implemented to provide fast,

secure and anonymous communication. These systems differ in terms of infrastructure

costs, type of protection provided and the transparency provided to users.

2.4.1 Crowds

Crowds [19] was developed by Reiter and Rubin at the ATT Laboratories. It

aims to provide a privacy preserving way of accessing the web, without web sites

being able to recognize which individuals machine is browsing. Crowds consists of a

number of network nodes that are run by the users of the system. Web requests are

randomly chained through a number of them before being forwarded to the web server

hosting the requested data. The server will see a connection coming from one of the

Crowds users, but cannot tell which of them is the original sender. In addition, Crowds

uses encryption, so that some protection is provided against attackers who intercept

a user’s network connection. However, this encryption does not protect against an

attacker who cooperates with one of the nodes that the user has selected, since the

encryption key is shared between all nodes participating in a connection. Crowds is

also vulnerable to passive traffic analysis: since the encrypted messages are forwarded

11

without modification, traffic analysis is trivial if the attacker can observe all network

connections. An eavesdropper intercepting only the encrypted messages between the

user and the first node in the chain as well as the cleartext messages between the final

node and the web server can associate the encrypted data with the plaintext using the

data length and the transmission time.

2.4.2 Onion Routing

Onion Routing [7, 17, 24, 25] is the most famous of all anonymizing networks.

In this system, a user sends encrypted data to a network of so-called Onion Routers

(Chaum Mixes). A trusted proxy chooses a series of these network nodes and opens

a connection by sending a multiply encrypted data structure called an “onion” to the

first of them. Each router is a store-and-forward device which receives messages of fixed

length from different sources, removes one layer of encryption, which reveals parameters

such as session keys, and forwards the encrypted remainder of the onion to the next

network node. An onion router can store messages for indefinite amount of time waiting

for the adequate number of messages, but this is practically not a feasible solution.

The onion routers wait for a fixed amount of time, which weakens the protection in

presence of low traffic. Once the connection is set up, an application specific proxy

forwards HTTP data through the Onion Routing network to a responder proxy which

establishes a connection with the web server the user wishes to use. The users proxy

multiply encrypts outgoing packets with the session keys it sent out in the setup phase;

each node decrypts and forwards the packets, and encrypts and forwards packets that

contain the servers response. The network model consists of core onion routers, the

end-proxy routers and the links between them, through which the routers pass messages

of fixed length. The routers form a complete graph among themselves so that every

message has equal probability of being forwarded to any of the routers. All the links try

to maintain same bandwidth and this is achieved by sending dummy packets to pad the

low-bandwidth links.

2.4.3 Babel

Babel [8] was designed in the mid-nineties. Babel offers sender anonymity, called

the “forward path” and receiver anonymity,through replies travelling over the “return

12

path”. The forward part is constructed by the sender of an anonymous message by

wrapping a message in layers of encryption. message can also include a return address

to be used to route the replies. The system supports bidirectional anonymity by

allowing messages to use a forward path, to protect the anonymity of the sender,

and for the second half of the journey they are routed by the return address so as to

hide the identity of the receiver. While the security of the forward path is as good

as in the secured original mix network proposals, the security of the return path is

slightly weaker. The integrity of the message cannot be protected, thereby allowing

tagging attacks, since no information in the reply address, which is effectively the only

information available to intermediate nodes, can contain the hash of the message body.

The reason for this is that the message is only known to the person replying using the

return address. Babel also proposes a system of intermix detours. Messages to be mixed

could be “repackaged” by intermediary mixes, and sent along a random route through

the network. It is worth observing that even the sender of the messages, who knows

all the symmetric encryption keys used to encode and decode the message, cannot

recognise it in the network when this is done.

2.4.4 MixMaster

Mixmaster has been an evolving system since 1995 [5, 11]. It is the most widely

deployed and used remailer system. It follows a message-based approach, namely it

supports sending single messages, usually email, though a fully connected mix network.

Mixmaster supports only sender anonymity. Messages are made bitwise unlinkable

by hybrid RSA and EDE 3DES encryption, while the message size is kept constant by

appending random noise at the end of the message. In version two, the integrity of the

RSA encrypted header is protected by a hash, making tagging attacks on the header

impossible. In version three the noise to be appended is generated using a secret shared

between the remailer, and the sender of the message, included in the header. Since the

noise is predictable to the sender, it is possible to include in the header a hash of the

whole message therefore protecting the integrity of the header and body of the message.

This trick makes replies impossible to construct since the body of the message would

not be known to the creator of an anonymous address block to compute in the hash.

13

Beyond the security features, Mixmaster provides quite a few usability features. It

allows large messages to be divided in smaller chunks and sent independently through

the network. If all the parts end up at a common mix, then reconstruction happens

transparently in the network. So large emails can be sent to users without requiring

special software. Recognising that building robust remailer networks could be difficult

(and indeed the first versions of the Mixmaster server software were notoriously

unreliable) it also allowed messages to be sent multiple times, using different paths. It

is worth noting that no analysis of the impact of these features on anonymity has ever

been performed.

2.4.5 Freedom

The Freedom [2] network consists of a set of nodes called Anonymous Internet

Proxies (AIPs) which run on top of the existing Internet infrastructure. The user

communicates by first selecting a series of nodes (a route), and then using this route

to forward IP packets that are stripped of identifying information. This system is

secure against denial-of-service attacks but is vulnerable to some general traffic analysis

attacks such as packet counting attack, wie-die’s attack, latency attack and, clogging

attack.

2.4.6 PipeNet

Pipenet was one of the early systems to be implemented. It is a synchronous

network implemented on top of an asynchronous network. Routes are created through

the network by choosing the intermediate hops uniformly at random. For providing

further anonymity, a certain number of route creation requests are collected by a node,

shuffled and then acted upon. The user establishes a shared key with each node on

its route as part of the route creation process, using a key negotiation algorithm. The

routes are padded end to end for their duration. End-to-end padding means that the

originator creates all of the padding and the recipient (or exit node) strips the padding,

each of the intermediate nodes is unable to distinguish padding from normal traffic,

and just processes it as normal. This system provided protection against general traffic

analysis but is vulnerable to Denial-of-Service attacks, which are more catastrophic in

nature than the normal traffic analysis kind of attacks.

14

2.4.7 Stop-And-Go Mixes

Stop-and-Go mixes [9] (sg-mix) present a mixing strategy, that is not based on

batches but delays. It aims at minimizing the potential for (n − 1) attacks, where the

attacker inserts a genuine message in a mix along with a flood of his own messages until

the mix processes the batch. It is then trivial to observe where the traced message is

going.

Each packet to be processed by an sg-mix contains a delay and a time window.

The delay is chosen according to an exponential distribution by the original sender,

and the time windows can be calculated given all the delays. Each sg-mix receiving a

message, checks that it has been received within the time window, delays the message

for the specified amount of time, and then forwards it to the next mix or final recipient.

If the message was received outside the specified time window it is discarded. A very

important feature of sg-mixes is the mathematical analysis of the anonymity they

provide. It is observed that each mix can be modeled as a M/M/∞ queue, and a

number of messages waiting inside it follow the Poisson distribution. The delays can

therefore be adjusted to provide the necessary anonymity set size.

2.4.8 Tarzan

Freedman designed Tarzan [19], a peer-to-peer network in which every node is a

mix. A node initiating the transport of a stream through the network would create an

encrypted tunnel to another node, and ask that node to connect the stream to another

server. By repeating this process a few times it is possible to have an onion encrypted

connection, relayed through a sequence of intermediate nodes.

An interesting feature of Tarzan is that the network topology is somewhat re-

stricted. Each node maintains persistent connections with a small set of other nodes,

forming a structure called a mimics. Then routes of anonymous messages are selected

in such a way that they will go through mimics and between mimics in order to avoid

links with insufficient traffic. A weakness of the mimics scheme is that the selection

of neighboring nodes is done on the basis of a network identifier or address which,

unfortunately, is easy to spoof in real-world networks.

15

2.5 Summary

In this chapter, we have presented in detail different types of mixes based on

blending strategies and flushing conditions used. The mixes are divided into simple and

pool mixes depending on whether the mix flushes all the messages or not. These two

categories are further subdivided into timed and threshold mixes based on the flushing

condition being a time interval or a threshold on number of messages. We can also have

hybrid mix types, which have both timed or/and threshold properties.

We have also described anonymous communication systems based on mix networks.

Various issues involved in design of mix-networks are presented. This includes the the

most important issue of how much anonymity the network provides and which type of

mix is used to assure such anonymity.

Finally, we discuss different real time mix systems deployed such as Crowds,

Onion-Routing, MixMaster etc. and the functionalities provided in those systems.

Different adversary models and attacks on mix networks are presented in next

chapter. The next chapter it discusses the anonymity metrics used in practice to

measure the level of anonymity provided by a anonymizing system. It also describes the

analysis technique used to analyze passive attacks on mixes.

CHAPTER 3ADVERSARY MODELS AND ATTACKS ON MIXES

In this chapter, we discuss the various adversary models, followed by different

types of attacks. The attacks include active attacks such as timing attacks and denial

of service attacks, and passive attacks which are mainly accomplished through traffic

analysis.

3.1 Adversary Models

The adversary models discussed below are high level descriptions of the attacker’s

powers and limitations [6].

3.1.1 Internal and External Adversary

An adversary can be a user compromising communication media and network

resources (external). An adversary can also be a compromised mix node, sender or a

recipient trying to leak information to outsiders (internal).

3.1.2 Active and Passive Adversary

An active adversary can arbitrarily modify the messages and computations, cause

interruption of service, fabricate new messages, and intercept the messages. Denial of

service and loss of data are examples of interruption, spoofing and forging are examples

of fabrication and modification. A passive adversary can only listen to the traffic.

This is typically done by eavesdropping the network connections by wiretapping, or

signal catching in case of wireless transmissions. We can also have a combination of

active and passive adversaries. For example, an active external adversary can insert

secret messages and a passive internal adversary can correlate the messages coming in a

compromised node with messages going out.

3.1.3 Local, Restricted and Global Adversary

A global adversary has the ability to see link traffic on every link and control each

and ever resource in the network, whereas a local adversary can observe traffic only on

certain links in the network. Depending on whether the adversary has complete control

16

17

over few local links or restricted control over a certain area in the network, he is called

a local or a restricted adversary.

3.1.4 Static and Adaptive Adversary

A static adversary chooses the tools required before the attack protocol starts

and can’t change them later in the middle of the attack. Most of the brute force

attacks (eg. password crackers) come under this category, since the attacker exhausts

all combinations of inputs using an automated tool, which normally is not adaptive.

Adaptive adversaries use different tools and resources depending on the response they

receive from the previous stage of attack. They can, for example, “follow” messages

that are tagged with the original message.

3.2 Attacks on Mixes

The attacks described below are high level descriptions of the attacker’s schemes

and not dependent on any specific implementation[18]. We assume that there are

no known implementation weaknesses in the system. The attacker can have any

combination of adversary powers discussed in the previous section. In the security

literature, the attacks are broadly classified into two main categories – active and

passive attacks.

3.2.1 Active Attacks

An active attack is one in which the intruder may transmit messages, replay old

messages, modify messages in transit, or delete selected messages from the wire. A typ-

ical active attack is one in which an intruder impersonates one end of the conversation,

or acts as a man-in-the-middle. Active attacks often have asymmetric characteristics in

that the attacker’s location makes one of the communicating parties more vulnerable.

Some of the common active attack schemes used are discussed briefly.

Brute Force Attack:. This the simplest and most inefficient of the attacks. Brute

force attack is an attack that requires trying all (or a large fraction of all) possible

values until the right value is found. In case of mixes, the adversary may want to follow

every possible path the message could have taken (passive external adversary). Using

this attack, the attacker is able to construct a list of possible recipients for a particular

18

message in most cases. But if the mix or mix-network is not designed well, the attacker

may be able to establish the sender-receiver correspondence.

To illustrate the working of brute force attack, let us consider a mix network with

individual nodes as threshold mix with a threshold n. Let us also assume that the

message go through exactly d mix nodes.

• The attacker follows a message from the sender to the first mix node.• The attacker then follows each of the n messages being flushed from the first mix

node. To do this, the attacker needs to observe n different links, if all the secondlevel mixes are different.

• The attacker continues this way till the route length is d nodes. At this point,the attacker would have been following nd messages. From these nd message, theattacker now has to choose only those messages that leave the mix network.

In the worst case, the attacker can learn the exact receiver from this attack. If the

mix is designed for perfect anonymity, the attacker may end up having nd possibilities.

Dummy messages are normally used as the counter measure against brute force attack.

Denial-of-service attack. A denial of service (DoS) attack is an incident in which a

user or organization is deprived of the services of a resource they would normally expect

to have. Network-flooding, spamming, port hammering, syn attack (in case of TCP

protocol), disk or memory exhaustion are some well known techniques of mounting a

DoS attack. By rendering some mix-nodes inoperational, the adversary tries to gain

information about the routes chosen by the remaining nodes in case of static networks

and by certain senders in case of dynamic mix networks.

Message-delaying attack. In this scheme, the attacker can withhold messages

until he can obtain enough resources (i.e., links, nodes) or until the network becomes

easier to monitor (or to see if the possible recipients receive other messages, etc.). In

defense of this attack, the mix nodes should be equipped to verify authenticated timing

information.

Message-tagging attack:. For this type of attack, an active internal adversary with

control over the first and last node in a message route is needed. To launch the attack,

the attacker can simply tag messages at the first node in such a way that the exit node

can spot them. Since the entry node knows the sender and the exit node the recipient,

19

the system is broken. To prevent this attack, measures should be taken to minimize or

eliminate the possibility of message tagging.

Node-flushing or blending attack. This attack was first mentioned by David Chaum

[21] in his seminal paper. The flushing attack is very effective and can be mounted by

an active global adversary. A spamming attack or n-1 attack is a very good example

for this type of attack. The capabilities of the adversary include delaying (removing)

messages, inserting arbitrarily many messages into the system in a short time. The

attack is illustrated in case of a simple threshold mix (n).

• The attacker observes the target message leaving the sender and delays it.• The attacker now sends fabricated messages until the mix fires.• As soon as the mix fires, he stops all other messages to the mix and sends the

target message along with n -1 of his own messages.• After the mix fires, the attacker can easily recognize his n-1 messages and

therefore determine the destination of the target message.

This is an exact attack – that is, it provides the adversary with the exact receiver

rather than a set of receivers as in case of the brute force attack. Also note that this

attack is mix specific and does not depend on the rest of the mix-network.

Timing attack. In this attack, the adversary uses the fact that different routes can

take different amounts of time. Given the set of messages coming into the mix-network

and the set of outgoing messages, the adversary uses the route time information to

establish a correlation between a certain set of incoming and outgoing messages.

The attacker doesn’t need to carry the expensive brute force or flushing attacks

to determine the route taken. If the attacker has access to one of the communicating

parties, he might be able to infer which route is taken by simply computing the round

trip time (that is, calculating the time it takes to receive a reply).

This attack can be prevented by using variable delay mixes, which wait for a

random amount of time before firing. This would cause uncertainty in estimating the

route lengths if the time taken is very close in magnitude.

Wie Die’s Attack. In this attack, the attacker wishes to defeat the traffic shaping

mechanisms [1] that attempt to hide the real volumes of traffic on an anonymous

channel. The attacker creates a route using the link that he wishes to observe, and

20

slowly increases the traffic on it. The router will not know that the stream or streams

are all under the control of the attacker, and at some point will signal that the link has

reached its maximum capacity. The attacker then subtracts the volume of traffic he

was sending from the maximum capacity of the link to estimate the volumes of honest

traffic.

Disclosure attack. The formal model on which the disclosure attack is based is

quite simple. A single mix is used by b participants each round, one of them always

being Alice, while the other (b− 1) are chosen randomly out of a total number of N − 1

possible participants. The threshold of the mix is b so it fires after each of the rounds

participants has contributed one message. Alice chooses the recipient of her message to

be a random member of a fixed set of m recipients. Each of the other participants sends

a message to a recipient chosen uniformly at random out of N potential recipients.

We assume that the other senders and Alice choose the recipients of their messages

independently from each other. The attacker observes R1, ..., Rt the recipient anonymity

sets corresponding to t messages sent out by Alice during t different rounds of mixing.

The attacker then tries to establish which out of all potential recipients, each of Alices

messages was sent to.

The original attack as proposed by Kesdogan et al. [9] first tries to identify

mutually disjoint sets of recipients from the sequence of recipient anonymity sets

corresponding to Alices messages. This operation is the main bottleneck for the

attacker since it takes a time that is exponential in the number of messages to be

analyzed.

3.2.2 Passive Attacks

A passive attack is one in which the intruder attempts to intercept and read data

without altering it. Passive monitoring attacks are often symmetric - if the attacker can

see the traffic from Alice to Bob on a particular link, there’s a good chance that he/she

can see the traffic in the reverse direction.

Communication-pattern attack. By simply looking at the communication patterns

(when users send and receive), one can find out much useful information. Communi-

cating participants normally don’t “talk” at the same time, that is, when one party

21

is sending, the other is usually silent. The longer an attacker can observe this type of

communication synchronization, the less likely it’s just an uncorrelated random pattern.

This attack can be mounted by a passive adversary that can monitor entry and exit

mix nodes. Law enforcement officials might be quite successful mounting this kind of

attack as they often have a-priori information: they usually have a hunch that two

parties are communicating and just want to confirm their suspicion.

Packet-counting attack. These types of attacks are similar to the other passive

attacks in that they exploit the fact that some communications are easy to distinguish

from others. If a participant sends a non-standard (i.e., unusual) number of messages,

a passive external attacker can spot these messages coming out of the mix-network. In

fact, unless all users send the same number of messages, this type of attack allows the

adversary to gain non-trivial information. The packet counting and communication

pattern attacks can be combined to get a message frequency attack (this might require

more precise timing information). Communication pattern, packet counting and

message frequency attacks are sometimes referred to as traffic shaping attacks and are

usually dealt with by imposing rigid structures on user communications. Notice that

protocols achieving “network unobservability” are immune to these attacks.

Intersection Attack:. An attacker having information about what users are

active at any given time can, through repeated observations, determine what users

communicate with each other. This attack is based on the observation that users

typically communicate with a relatively small number of parties. For example, the

typical user usually queries the same web sites in different sessions (his queries aren’t

random). By performing an operation similar to an intersection on the sets of active

users at different times it is probable that the attacker can gain interesting information.

Probabilistic or Partial Attack:. Most of the preceding attacks can be carried

out partially, that is, the attacker can obtain partial or probabilistic information. For

example, he could deduce with probability p that A is communicating with B or A is

not communicating with B, C and D.

Covert Channels:. Covert channels are discussed in Section 4.2.5.

22

3.3 Summary

In this chapter, we present novel attacks on a mix node or a mix-network and the

adversary models used to accomplish this attack. The adversary can be an insider or

an external observer, an active attacker or a passive eavesdropper, a local attacker or a

global adversary who has control over the whole network.

The attacks are divided into active and passive attacks. Active attacks involves

modification, fabrication, and interception of messages by the attacker. Some well

known examples are brute force attack, Denial-of-Service(Dos) attack, and node

flushing attack. Passive attack and allows an attacker to compromise anonymity

through observing the network traffic for traffic patterns, packet counts, packet sizes

etc. Passive attacks are very difficult to detect and may prove to be very harmful.

Chapter 4 presents the various anonymity metrics and the analysis technique being

used to analyze various attacks with distinct adversary models.

CHAPTER 4ANONYMITY METRICS AND ANALYSIS TECHNIQUE

This chapter describes information theoretic models, proposed in the literature, to

quantify the degree of anonymity provided by different systems of mix networks. At

first we discuss use of anonymity sets as the measure of anonymity and then we go on

to analyze the entropy based and route based metrics. Finally, we present anonymity

analysis of real time anonymizing systems such as Onion routing and Crowds.

4.1 Anonymity

electronic voting.

Anonymity can be classified as connection anonymity and data anonymity. Data

anonymity is about hiding the contents of the packet sent and received in a particular

session. Data anonymity is normally achieved by encryption. Connection anonymity is

about hiding identities of the source and the destination during the actual information

exchange.

As discussed in by Reiter and Rubin [19], there are three types of connection

anonymity: sender anonymity, receiver anonymity, and unlinkability of sender and

receiver. Sender anonymity means that the identity of the party who sent a message is

hidden, while its receiver (and the message itself) might not be. Receiver anonymity

similarly means that the identity of the receiver is hidden. Unlinkability of sender

and receiver means that though the sender and receiver can each be identified as

participating in some communication, they cannot be identified as communicating with

each other.

A second aspect of anonymous communication is the adversary model against

which these properties are achieved. The attacker might be an eavesdropper that

can observe some or all messages sent and received, collaborations consisting of some

senders, receivers, and other parties, or variations of these. Different types of attacks

and adversary models have been discussed in Chapter 3.

23

24

We cant provide “perfect” privacy since the number of possible senders and

recipients is bounded. So, for example, if there are only two parties on the network, an

attacker having access to this information can trivially determine who is communicating

with whom. The best we can hope for is to make all possible sender-recipient matchings

look equally likely. That is, the attackers view’s statistical distribution should be

independent from the actual sender-recipient matchings.

4.2 Anonymity Metrics

Many real time anonymity systems have been deployed in past decade, Onion

Routers and Crowds being few examples. With each of these systems providing dif-

ferent level anonymity, there is a definite need to have standard metrics to classify the

levels of anonymity provided. Information theory has been proven to be a useful tool

to measure the amount of information. This can be used in measuring the information

gained by the attacker. Depending on the power of the attacker, and the circumstances

we can quantify the anonymity level provided by the system.

4.2.1 Anonymity Sets

Traditionally, anonymity sets have been used to measure the anonymity of mix

systems. The notion of anonymity sets was introduced by Chaum for modeling security

of DC-Net(Dining Cryptographers’ Networks)[3].

Chaum defines anonymity set as the set of participants who could have sent a

particular message, as seen by a global observer who has also compromised a set of

nodes[4]. The side of anonymity set is a good indicator of how good the anonymity

provided by the system really is. In the best case, the anonymity set is equal to the

number of users, which means any user has equal probability of sending the message. In

the worst case, the size is one, which means there is no anonymity in the network.

4.2.2 Problems with Anonymity Set Size

The attacks against DC networks presented in [4] can only result in partitions of

the network in which all the participants are still equally likely to have sent or received

a particular message. Therefore the size of the anonymity set is a good metric of the

quality of the anonymity offered to the remaining participants.

25

In the stop-and-go system [9] definition, the authors realize that different senders

may not have been equally likely to have sent a particular message, but choose to

ignore it. If different participants accounted in the anonymity set are not equally likely

to be the senders or receivers, a designer might be tempted to distribute amongst many

participants some possibility that they were the senders or receivers while allowing the

real sender or receiver to have an abnormally high probability. The cardinality of the

anonymity set is in this case a misleading measure of anonymity. In the standardization

attempt, we see that there is an attempt to state, and take into account this fact in the

notion of anonymity, yet a formal definition is still lacking. Serjantov and Danezis[20]

discuss this fact in their paper and conclude that it is unwisely ignored in the literature

but can give a lot of extra information to the attacker.

The Pool Mix. We discuss the case of pool mix to further emphasize the dangers of

using sets and their cardinalities to assess and compare anonymity systems. This mix

always stores a pool of n messages. When incoming N messages have accumulated in

its buffer, it picks n randomly out of the n + N it has, and stores them, forwarding the

remaining N in the regular manner. The details about pool mix has been described in

section 2.2.

There is always a small probability that any message that has ever gone into the

mix have never left it. Therefore, the sender of every message should be included in the

anonymity set. At this point if we consider the anonymity provided by this system in

terms of anonymity set size, it would include all the messages gone into the mix. We

notice that the anonymity set is independent of the size of the pool, n, which intuitively

suggests that the anonymity metric used is inappropriate.

Knowledge Vulnerability. Anonymity set metric is also vulnerable against at-

tacker’s has additional knowledge about the system. Consider the arrangement of

mixes in Figure 4–1. The small squares in the diagram represent senders, labeled with

their name. The bigger boxes are mixes, with threshold of 2. Some of the receivers are

labeled with their sender anonymity sets.

Notice that if the attacker somehow establishes the fact that, for instance, A

is communicating with R, he can derive the fact that S received a message from E.

26

Mix-1

Mix-2

Mix-3

Mix-4

A

B

C

D

E

P

Q

R

S

''OOOOOOOOOOOOO

77ooooooooooooo

77ooooooooooooo

''OOOOOOOOOOOOO

77ooooooooooooo

??ÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

ÂÂ???

????

????

???

ÂÂ???

????

????

???

99rrrrrrrrrr

99rrrrrrrrrr

99rrrrrrrrrr

%%LLLLLLLLLL

Figure 4–1: Vulnerability of Anonymity Sets

Indeed, to expose the link E → S, all the attacker needs to know is that one of

A, B, C, D is communicating to R. And yet this is in no way reflected in S’s sender

anonymity set (although E’s receiver anonymity set, as expected, contains just R and

S).

It is also clear that not all senders in this arrangement are equally vulnerable

to this, as is the fact that other arrangements of mixes may be less so. Although we

have highlighted the attack here by using mixes with threshold of 2, it is clear that the

principle can be used in general to cut down the size of the anonymity set.

4.2.3 Entropy

Serjantov and Danezis [20] formalized the use of entropy as anonymity metric and

extended it to calculate the anonymity in a system of mixes. The principal insight

behind the metric(entropy) is that the goal of an attacker is the unique identification

of an actor(sender or receiver), while at the same time the goal of the defender is

to increase the attackers workload to achieve this. Therefore we chose to define the

anonymity provided by a system as the amount of information the attacker is missing

to uniquely identify an actors link to an action.

27

The term information is used in a technical sense in the context of Shannons

information theory [22]. Therefore we define a probability distribution over all actors

αi, describing the probability they performed a particular action. As one would expect,

the sum of these must be one. The sum of these probabilities must always be equal to

one.∑

Pr[αi] = 1

As soon as the probability distribution above is known, one can calculate the

anonymity provided by the system as a measure of uncertainty that the probability

distribution represents. In information theoretic terms this is represented by the en-

tropy of the discrete probability distribution. Therefore we call the effective anonymity

set size of a system, the entropy of the probability distribution attributing a role to

actors given a threat model. It can be calculated as

A = ε[αi] =∑

Pr[αi] log Pr[αi]

This metric provides a negative quantity representing the number of bits of

information an adversary is missing before they can uniquely identify the target. A

similar metric based on information theory was proposed by Diaz et al. [6]. Instead of

directly using the entropy as a measure of anonymity, it is normalized by the maximum

amount of anonymity that the system could provide. This has the disadvantage that it

is more a measure of fulfilled potential than anonymity. An anonymity size of 1 means

that one is as anonymous as possible, even though one might not be anonymous at all.

The non-normalized entropy based metric we propose, intuitively provides an indication

of the size of the group within which one is hidden. It is also is a good indication of the

effort necessary for an adversary to uniquely identify a sender or receiver.

4.2.4 Route Length

In the previous section, we have demonstrated that entropy based metrics can give

the attacker more information about the system than just anonymity sets.

28

We note that the standard attacks aimed at reducing the size of the anonymity

set will now have the effect of narrowing the anonymity probability distribution. If

we consider this distribution as a set of pairs (of a sender and its respective non-zero

probability of having sent the message), then narrowing the probability distribution is

the process of deriving that some senders have zero probability of sending the message

and can therefore be safely excluded from the set.

As suggested in [20], route length is important and some arrangements of mixes

are more vulnerable to route length based attacks than others. If the attacker knows

the maximum route length allowed by the mix system, then he can eliminate all the

routes longer than the maximum length. This reduces the entropy of the anonymity

probability distributions without affecting the underlying anonymity set. Hence, the

maximum route length should be taken into account when calculating anonymity sets.

Several mix systems have been designed to remove the maximum route length

constraint, for instance via tunneling in Onion Routing [17] or Hybrid mixes, but it

exists in fielded systems such as Mixmaster [5, 11] (maximum route length of 20) and so

can be used by the attacker. It may also be possible to obtain relevant information by

compromising a mix. Some mix systems will allow a mix to infer the number of mixes a

message has already passed through and therefore the maximum number of messages it

may go through before reaching the destination. Such information would strengthen our

attack, so care needs to be taken to design mix systems (such as Mixmaster [5]) which

do not give it away.

examples of covert channels, covert channel analysis(CCA) and covert channels

arising in mix networks.

4.2.5 Covert Channels

Covert channels can be either innocuous or harmful. Innocuous channels are con-

sistent with the intent of the systems’s security policy. They may result in surprising

system behaviors, but do not place the system or the information that it protects at

risk. Harmful covert channels are information flows that are contrary to the intent of

the system’s security policy.

29

Several definitions for covert channels have been proposed in literature, such as the

following:

• Definition 1: A communication channel is covert if it is neither designed norintended to transfer information at all

• Definition 2: A covert channel is a mechanism that can be used to transferinformation from one user of a system to another using means not intended forthis purpose by the system developers.

• Definition 3: Covert channels “will be defined as those channels that are a resultof resource allocation policies and resource management implementation.”

All the above definitions are vague (What is information? what is intent?) and

omit any discussion of security. None of the above definitions brings out explicitly

the notion that covert channels depend on the type of mandatory access control (e.g.,

Bell La Padula or Biba model) policy being used and on the policy’s implementation

within a system design. A new definition using these concepts can be provided that is

consistent with the TCSEC definition of covert channels:

“A covert channel is a communication channel that allows a process to transfer

information in a manner that violates the system’s security policy”

In any scenario of covert channel exploitation, one must define the synchronization

relationship between the sender and the receiver of information. Thus, covert channels

is characterized by the synchronization relationship between the sender and the

receiver. The purpose of synchronization is for one process to notify the other process

it has completed reading or writing a data variable. Therefore, a covert channel may

include not only a covert data variable but also two synchronization variables, one for

sender- receiver synchronization and the other for the receiver-sender synchronization.

Any form of synchronous communication requires both the sender-receiver and receiver-

sender synchronization either implicitly or explicitly.

However, sender-receiver synchronization may still need a synchronization variable

to inform the receiver of a bit transfer. A channel that does not include sender-receiver

synchronization variables in a system allowing the receiver-sender transfer of messages

is called a quasi-synchronous channel.

In all patterns of sender-receiver synchronization, synchronization data may be

included in the data variable itself at the expense of some bandwidth degradation.

30

Packet-formatting bits in ring and Ethernet local area networks are examples of

synchronization data sent along with the information being transmitted. Thus, explicit

sender-receiver synchronization through a separate variable may be unnecessary.

Covert channels are more serious problem in a network system. Network traffic

analysis is much more easier than monitoring CPU timing and scheduling process.

Network covert channel can be based on either timing or spatial information of the

traffic flow pattern. Using spatial information, an eavesdropper observing network

traffic can observe the size and destination of the packets to get information. In

collaboration of an internal active adversary, the covert channel can be coded by

varying the packet size and destination. Using timing information, a covert channel

is represented by the frequency and burstiness of the packet generation. The next

subsection discusses a particular type of covert channel existing mix networks.

4.2.6 Covert Channels in Mix Networks

An insider can use the exit-mix server to covertly communicate with an external

passive eavesdropper by using the information that the eavesdropper (Eve) can proba-

bilistically determine if the insider (Alice) sends a message in a particular time interval.

This is an example of a one-directional network covert channel, and was first discovered

by Newman, Moskowitz, Crepeau, and Miller [13].

To illustrate the channel, let us assume that we have a simple exit-mix server.

Alice, the insider, wants to transfer information covertly to the eavesdropper, Eve. The

only action that Eve can take is to count the number of messages per t going from the

Mix-firewall to each of receivers, since the messages are indistinguishable.

In a perfect noiseless scenario with single receiver, Alice can transmit bits 1 and

0 to Eve by sending a message or not sending a message. Alice can use a predecided

encoding to send important information through this channel.

The external adversary model can be either global model, which has control over

all the links originating from the mix as shown in 4–3 or a restricted model, which can

count the number of messages between two enclaves as shown in Figure 4–2.

31

4.2.7 Covert Channel Capacity as Anonymity Metric

In the covert channel scenario presented in previous subsection, Alice can obviously

leak considerable information to Eve. The ability to communicate covertly arises due

to a lack of anonymity. If there were “perfect” anonymity, then we would not expect

to find a covert channel [13]. By measuring the amount of covert information that may

be leaked through less than perfect anonymity, we can obtain an estimate of anonymity

provided by the system.

The mutual information is a good indication of interference between sender and

eavesdropper. One way to measure this is by estimating the lower bound of capacity.

Shannon’s Information Theory [22] is used to calculate the mutual information and

the capacity of the channel (which is the maximum value of mutual information). The

analysis technique and capacity calculations are presented in Section 4.3.

In the initial work [13], it is shown that as system level anonymity increases in

the simple mix models (i.e., the number of potential senders increases), the minimum

capacity decreases to zero. However, as the probability that a Clueless sender transmits

in a given tick increases, the expected number of actual senders in a given time tick

also increases, hence the anonymity increases, but the capacity of the covert channel

increases once this probability exceeds 0.5.

of network design.

4.3 Analysis Technique

In this section we would present some scenarios for covert channels arising when

using a mix server for different adversary models and network settings. The next

subsection discusses the network channel matrix and capacity estimation.

4.3.1 Scenarios

There is always one special transmitting node in a network called Alice, which is

the malicious. Alice has capabilities of an active internal adversary and can be either

static or dynamically adapt to retain the covert channel.

Alice and possibly other transmitters(assume N) have legitimate business transmit-

ting messages to a set of receivers Ri|i = 1, 2, ..., M . These transmitters act completely

32

independently of one another, and have no direct knowledge of each other’s recent

transmission behavior.

Alice may have some general knowledge of the long-term traffic levels produced by

the other transmitters, e.g., the number of other transmitters and their probabilistic

behavior, which can allow Alice to write a code that can improve the covert communi-

cation channel’s data rate. She cannot, however, perform short-term adaptation to their

behavior.

We also assume that there is a clock, and that transmissions only occur in the unit

interval of time called a tick. Any subset of transmitters can each either send a single

message to a single receiver in a tick, or not send a message at all. Each transmitter in

a tick can send to a different receiver, and two or more transmitters may send to the

same receiver in the same tick. All messages’ contents are encrypted end-to-end.

Enclave 1Ä~|xyzwvutpqrs

Enclave 2Ä~|xyzwvutpqrsEve

33

Figure 4–2: Restricted Passive Adversary Model

There is also an eavesdropper on the network called Eve. Since all transmissions

are encrypted, they appear to the eavesdropper Eve as having indistinguishable content.

Eve may be either a global passive adversary (GPA), with the ability to see link traffic

on every link in the network, or a restricted passive adversary (RPA), with the ability

to observe traffic only on certain links.

Alice is not allowed any direct communication with Eve. However, Alice can

influence what Eve sees on the network. We study network scenarios that attempt to

achieve a degree of anonymity with respect to the network communication. That is, the

networks are designed with various anonymity devices to prevent Eve from learning who

is sending a message to whom. Even if a certain degree of anonymity is achieved, it still

may be possible for Alice to communicate covertly with Eve.

33

4.3.2 Channel Matrix

Between Alice and the N clueless senders, there are N + 1 possible senders per

t, and there are M + 1 possible actions per sender (since each sender may or may not

transmit, and if it does transmit, it transmits to exactly one of the M receivers).

Alice

R1

RM

···

Eve

55kkkkkkkkkkkkkkkk

))SSSSSSSSSSSSSSSS

OO

Figure 4–3: Global Passive Adversary Model

We consider Alice to be the input to the quasi-anonymous channel, which is a

proper communications channel [22]. Alice can send to one of the M receivers or not

send a message. Thus, we represent the inputs to the quasi-anonymous channel by

the M + 1 input symbols 0, 1, . . . , M , where i = 0 represents Alice not sending a

message, and i ∈ 1, . . . , M represents Alice sending a message to the ith receiver Ri.

However, note that the “receiver” in the quasi-anonymous channel is Eve. Eve receives

the output symbols ej, j = 1, . . . , K. Eve receives e1 if no sender sends a message.

The quasi-anonymous channel that we have been describing is a discrete memory-

less channel (DMC). We define the channel matrix M as an (M + 1)×K matrix, where

M[i, j] represents the conditional probability that Eve observes the output symbol ej

given that Alice input i.

34

MM+1,K =

0 1 2 . . . j j + 1 . . . K

0 p0,0 p0,1 p0,2 . . . p0,j p0,j+1 . . . p0,K

1 p1,0 p1,1 p1,2 . . . p1,j p1,j+1 . . . p1,K

2 p2,0 p2,1 p2,2 . . . p2,j p2,j+1 . . . p2,K

......

......

. . ....

.... . .

...

i pi,0 pi,1 pi,2 . . . pi,j pi,j+1 . . . pi,K

......

......

. . ....

.... . .

...

M pM,0 pM,1 pM,2 . . . pM,j pM,j+1 . . . pM,K

The number

of symbols seen by Eve may vary, depending on the adversary model considered. For

example, with an RPA observing a link between two mix-enclaves, the number of

symbols observed by Eve is N + 1. Whereas if a GPA is observing all the links going

out a exit-mix, the number of possible symbols is much higher and a function of the

receivers, M . N + 1 senders can send or not send, at most one message each, out of the

private enclave, provided at least one sender does send a message. For example there

is only one output symbol observed by Eve for the N+1 ways that one, and only one

sender, can send a message to R1.

We model Alice according to the following distribution each t:

P (Alice sends a message to Ri) = xi

From the above equation, we get

x0 = P (Alice doesn′t send a message) = 1−M∑i=1

xi .

We let A represent the distribution for Alice’s input behavior, and we denote by E

the distribution of the output symbols that Eve receives. Thus, the channel matrix

M along with the distribution A totally determine the quasi-anonymous channel.

This is because the elements of M take the distributions Ci into account, and M and

A let one determine the distribution E describing the outputs that Eve receives,

P (Eve receives ej).

35

Given a discrete random variable X, taking on the values xi, i = 1, . . . , nX , the

entropy of X is

H(X) = −nX∑i=1

p(xi) log p(xi) .

We use p(xi) as a shorthand notation for P (X = xi). Given two such discrete random

variables X and Y we define the conditional entropy (equivocation) to be

H(X|Y ) = −nY∑i=1

p(yi)

nX∑j=1

p(xj|yi) log p(xj|yi) .

Given two such random variables we define the mutual information between them to be

I(X, Y ) = H(X)−H(X|Y ) .

Note that H(X)−H(X|Y ) = H(Y )−H(Y |X), so we see that I(X, Y ) = I(Y,X).

For a DMC whose transmitter random variable is X, and whose receiver random

variable is Y , we define the channel capacity [22] to be:

C = maxX

I(X, Y ),

where the maximization is over all possible distribution values p(xi) (that is, the p(xi)

are all non-negative and sum to one).

For us, the capacity of the covert channel between Alice and Eve is

C = maxxH(E)−H(E|A).

where the maximization is over the different possible values that the xi may take (of

course, the xi are still constrained to represent a probability distribution). Recall

M[i, j] = P (E = ej|A = i), where M[i, j] is the entry in the ith row and jth column of

the channel matrix, M.

4.4 Summary

In this chapter we have defined the objectives of anonymous communication, and

the threats against it. We have showed how using anonymity set as metric can lead to

wrong results. The pool mix was used as an example to illustrate how anonymity set

showed perfect anonymity, when it was intuitively not possible.

36

We presented entropy as metric measuring anonymity, based on Shannons informa-

tion theory. This represents how much information an adversary is missing to identify

the sender or the receiver of a target message. Using covert channel capacity as a mea-

sure of anonymity is discussed followed by covert channel Scenarios in Mix Networks.

Finally, we present the channel matrix as the tool to estimate the channel capacity.

CHAPTER 5PREVIOUS WORK AND THE EXIT-MIX MODEL

This chapter presents the previous work done (which forms the basis of our work),

exit-mix firewall model setup and assumptions. It describes the conventions and

terminology used, the message distribution probabilities, traffic adversary model and

channel matrix in detail.

5.1 Capacity Analysis for Indistinguishable Receivers Case

The initial work [13] analyzed the situation where there are two enclaves, commu-

nication between them is encrypted, and packets are sent only from the first enclave

(which contains Alice) to the second (Fig. 4–2). Eve is able to monitor the commu-

nication from the first enclave to the second. Anonymity is “achieved” in that an

eavesdropper such as Eve (as RPA) does not “know” who is sending a message (that

is hidden inside of the first enclave) nor who is receiving the message (this can only

be known if one is interior to the second enclave). Eve is only allowed to know how

many messages per tick travel from the first enclave to the second. Nonetheless, Alice

attempts to communicate covertly with Eve.

The input symbols for this channel are 0, which signifies that Alice is not trans-

mitting a message to any receiver, and 0c, which signifies that Alice is transmitting a

message to some receiver (keep in mind that Alice is oblivious to the other transmit-

ters).

We break Scenario down into three cases: case 5.1.1, case 5.1.2, and case 5.1.3.

Case 5.1.3 is the general form of Scenario and the first two are simplified special cases.

5.1.1 Case 0: Alice Alone

This is the case where N = 0. Alice is the only transmitter. Alice sends either 0

(by not sending a message) or 0c (by sending a message). Eve receives either e0 = 0

(Alice did nothing) or e1 = 1 (Alice sent a message to a receiver). The capacity of this

noiseless covert channel is 1.

37

38

Note though the capacity is the maximum, over the probability x for Alice

inputting a 0, of the mutual information I(E, A). A is the distribution for Alice

described by x, and E is the distribution for Eve. Since there is no noise, I is simply

the entropy H(E) describing Eve (which is maximized to 1 when x = .5).

I(E,A) = H(E) = −x log x− (1− x) log(1− x).

5.1.2 Case 1: Alice and One Additional Clueless Transmitter

In this case N = 1. Therefore, Eve receives:

• 0 if neither Alice nor Clueless transmit;

• 1 if Alice does not transmit and Clueless does transmit, or Clueless transmits and

Alice does not; or

• 2 if both Alice and Clueless transmit.

A // anonymizingnetwork

// E

A

0

0

p 33ffffffffffffffffff

q++XXXXXXXXXXXXXXXXXX

1

0c

α 33ffffffffffffffffff

β++XXXXXXXXXXXXXXXXXX

2

B

Figure 5–1: Channel Model for Subsection 5.1.1. A) Channel block diagram. B) Chan-nel transition diagram

Figure 5–1B shows the output symbols corresponding to the three states E might

perceive. Let us consider the channel matrix.

M2.1 =

0 1 2

0 p q 0

0c 0 α β

39

The 2 × 3 channel matrix M2.1[i, j] represents the conditional probability of Eve

receiving the symbol j when Alice sends the symbol i. It follows that p = α, and thus it

trivially follows that q = β.

So our channel matrix simplifies to:

0 1 2

0 p q 0

0c 0 p q

.

The probability that Alice sends a 0 is P (A = 0) = x, and therefore P (A = 0c) =

1 − x. The term x is the only term that can be varied to achieve capacity. Here is

where Alice may use knowledge of long-term transmission characteristics of the other

transmitters, as well as how many other transmitters there are, to change her (long-

term) behavior. As with other studies of covert channels [12] we are not concerned with

source coding/decoding issues [22]. Our concern is the limits on how well a transmitter

can “optimize” its bit rate to a receiver, given that a channel is noisy. The capacity of

the covert channel between Alice and Eve is

C = maxxH(E)−H(E|A).

Given the above channel matrix we have:

H(E) = −px log px + [qx + p(1− x)] log[qx + p(1− x)] + q(1− x) log q(1− x).

and H(E|A) = −1∑

i=0

p(ai)2∑

j=0

p(ej|ai) log p(ej|ai) = h(p) .

Where h(p) denotes the function −p log p− (1− p) log(1− p). Thus,

C = maxx

−(px log px

+[qx + p(1− x)] log[qx + p(1− x)]

+q(1− x) log q(1− x))− h(p)

.

We cannot analytically find the x that maximizes the mutual information, even doing

the standard trick of setting the derivative of the mutual information to zero. However,

40

we can plot the capacity as a function of p, and of the x value that maximizes the

mutual information as a function of p.

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

p = P(Alice not sending a message) −−>

Low

er B

ound

of C

apac

ity

Capacity as a function of p

Figure 5–2: Plot of Covert Channel Capacity as a Function of p

Figure 5–2 shows certain symmetries. The capacity graph is symmetric about

p = .5, and the graph of the x that achieves capacity is skew-symmetric about p = .5

Consider the two situations where p = ε, and where p = 1 − ε; in both situations

0 ≤ ε ≤ .5. Let xε be the probability for the input symbol 0 that achieves capacity in

the first situation, and let x1−ε be the probability that achieves capacity for the second

situation. For the first situation we have that 1−xε is the capacity achieving probability

for the output symbol 0c, and similarly for the second situation 1 − x1−ε is the capacity

achieving probability for the output symbol 0c. Physically the two situations are “the

same” if we reverse the roles of the outputs symbols 0 and 2. Therefore xε = 1 − x1−ε.

Writing xε as xε = 12

+ ∆, we see that x1−ε = 12−∆; this is what the lower dotted plot

shows in Figure 5–2 (ε = 1/2 ⇒ ∆ = 0).

Observation 1 In conditions of very little extra traffic, or very high extra traffic, the

covert channel from Alice to Eve has higher capacity.

Observation 2 The capacity C(p), as a function of p is strictly bounded below by

C(.5), and C(.5) is achieved when the mutual information is evaluated at x = .5.

It is obvious that very little extra traffic corresponds to very little noise. At first

glance though, it seems counterintuitive that heavy traffic also corresponds to a small

41

amount of noise. This is because the high traffic is used as a baseline against which to

signal. This is analogous to transmission of bits over a channel where the bit error rate

(BER) Pe is greater than 1/2. In this case, the capacity of the channel is the same as

that of a channel with BER of 1− Pe, by first inverting all the bits. It is the in-between

situations that negatively affect the signaling ability of Alice. But, even in the noisiest

case (i.e., where p = .5) Alice can still transmit with a capacity of a half bit per tick.

Note that we can never guarantee error-free transmission, no matter how we

group the output symbols. In fact, it is possible that the outputs will always be the

symbol 1 (of course the probability of this quickly approaches zero, as the number

of transmissions goes up). So this covert channel has a zero-error capacity [23] of

zero. Capacity is a useful measure of a communication channel if the assumption is

that the transmitter can transmit a large number of times. With a large number of

transmissions, an error-correcting code can be utilized so as to achieve a rate close to

capacity. If the transmitter only transmits a small number of transmissions, then using

the capacity alone can be misleading.

5.1.3 Case 2: Alice and N Additional Transmitters

we imagine that there are N + 1 transmitters, Alice is one of them, and the other

N are all independently identical clueless transmitters. That is, there are transmitters

Clueless1, Clueless2, . . ., CluelessN . Again, Eve can only see how many messages are

leaving the first MIX-firewall headed for the second MIX-firewall. Therefore Eve can

determine if there are 0, 1, . . . , N + 1 messages leaving the firewall. That is all Eve can

determine. Therefore, there are still the two input symbols a0 = 0 and a1 = 0c, but we

have N + 2 output symbols. The probability that Cluelessi does not send a message is

still p, and that it does send a message is q = 1− p. Now, calculate the channel matrix.

Keep in mind that Alice acts independently of the Cluelessi.

Alice sends a 0

• For Eve to receive ek (that is E = k), 0 ≤ k ≤ N we need k of the clueless

transmitters to send a message, and N − k not to send a message. Therefore,

p(ek|A = 0) =

(N

k

)pN−kqk, 0 ≤ k ≤ N.

42

• p(eN+1|A = 0) = 0.

Alice sends a 0c

• p(e0|A = 0c) = 0, since the event never happens.

• For Eve to receive ek (that is E = k), 1 ≤ k ≤ N + 1 we need k − 1 of the clueless

transmitters to send a message, and N − k + 1 not to send a message.

p(ek|A = 0c) =

(N

k − 1

)pN−k+1qk−1, 1 ≤ k ≤ N + 1.

0

1

0

pN77ppppppppppppppppppppppp

NpN−1q 22eeeeeeeeeeeeeeeeeeeee

qN

&&MMMMMMMMMMMMMMMMMMMMMMM ··

0c

pN

88qqqqqqqqqqqqqqqqqqqqqqq

NpqN−1 ,,YYYYYYYYYYYYYYYYYYYY

qN &&NNNNNNNNNNNNNNNNNNNNN ·N

N + 1

A

The channel matrix M3.N is

( 0 1 2 . . . N N + 1

0 pN NpN−1q(

N2

)pN−2q2 . . . qN 0

0c 0 pN NpN−1q . . . NpqN−1 qN

)

B

Figure 5–3: Channel for Case 3, the general case of N clueless users. A) Channel transi-tion diagram. B) Channel Matrix

We obtain the following results from the analysis. The full details and proofs are in

[13].

• In conditions of very little extra traffic, or very high extra traffic, the covert

channel from Alice to Eve has higher capacity.

• The capacity C(p), as a function of p is strictly bounded below by C(.5), and

C(.5) is achieved when the mutual information is evaluated at x = .5 (of course

p = .5 also in this situation).

43

• The capacity C(p), as a function of p is strictly bounded below by a function that

decreases monotonically to zero as the number of transmitters increases, but is

never zero.

• The bias in the code used by Alice to achieve the optimum data rate on the

channel is not always x = 0.5, but it is never far from 0.5, and our preliminary

experimental results indicate that the difference in capacity is minor.

This last observation agrees with [10], which presents the general result that in

DMCs, mutual information bit rates obtained by using x = .5 is no less than 94.21%

of the channel capacity. Even if Alice has no knowledge of the probabilistic behavior

of the other transmitters, her data rate will not be too far from optimal if she uses an

unbiased code.

5.2 Exit-Mix Model

5.2.1 Scenario

There are N + 1 senders in a private enclave. Messages pass one way from the

private enclave to a set of M receivers. The private enclave is behind a firewall which

also functions as a timed Mix [21] that fires every tick, t, hence we call it a simple

timed Mix-firewall. For the sake of simplicity we will refer to a simple timed Mix-

firewall as a Mix-firewall in this paper. One of the N + 1 senders, called Alice, is

malicious. The other N clueless senders, Cluelessi, i = 1, . . . , N , are benign. Each

sender may send at most one message per unit time t to the set of receivers. All

messages from the private enclave to the set of receivers pass through public lines that

are subject to eavesdropping by an eavesdropper called Eve. The only action that Eve

can take is to count the number of messages per t going from the Mix-firewall to each

receiver, since the messages are otherwise indistinguishable. Eve knows that there are

N + 1 possible senders. The N clueless senders act in an independent and identical

manner (i.i.d.) according to a fixed distribution Ci, i = 1, . . . , N . Alice, by sending or

not sending a message each t to at most one receiver, affects Eve’s message counts. This

is how Alice covertly communicates with Eve via a quasi-anonymous channel [14].

44

Mix-firewall

R1

R2···

Eve

Ri···RM

Clueless1

Clueless2··Alice··

Cluelessi···CluelessN

++VVVVVVVVVVVVVV

--ZZZZZZZZZZZZZZ

//11dddddddddddddd 33hhhhhhhhhhhhhh

44hhhhhhhhhhhhhhhh11dddddddddddddddd

..]]]]]]]]]]]]]]]]

**VVVVVVVVVVVVVVVV

OO

Figure 5–4: Exit Mix-firewall Model with N Clueless Senders and M DistinguishableReceivers

Alice acts independently (through ignorance of the clueless senders) when deciding

to send a message; we call this the ignorance assumption. Alice has the same distribu-

tion each t. Between Alice and the N clueless senders, there are N + 1 possible senders

per t, and there are M + 1 possible actions per sender (each sender may or may not

transmit, and if it does transmit, it transmits to exactly one of M receivers).

We consider Alice to be the input to the quasi-anonymous channel, which is a

proper communications channel [22]. Alice can send to one of the M receivers or not

send a message. Thus, we represent the inputs to the quasi-anonymous channel by the

M + 1 input symbols 0, 1, . . . ,M , where i = 0 represents Alice not sending a message,

and i ∈ 1, . . . , M represents Alice sending a message to the ith receiver Ri. The

“receiver” in the quasi-anonymous channel is Eve. Eve receives the output symbols

ej, j = 1, . . . , K. Eve receives e1 if no sender sends a message. The other output

symbols correspond to all the different ways the N + 1 senders can send or not send,

at most one message each, out of the private enclave, provided at least one sender does

send a message.

5.2.2 Channel Matrix Probabilities

For the sake of simplicity we introduce a dummy receiver R0 (not shown above). If

a sender does not send a message we consider that to be a “message” to R0. For N + 1

senders and M receivers, the output symbol ej observed by Eve is an M + 1 vector

〈aj0, a

j1, ...., a

jM〉, where aj

i is how many messages the Mix-firewall sends to Ri. Of course

it follows that∑M

i=0 aji = N + 1.

45

The quasi-anonymous channel that we have been describing is a discrete memory-

less channel (DMC). We define the channel matrix M as an (M + 1)×K matrix, where

M[i, j] represents the conditional probability that Eve observes the output symbol ej

given that Alice input i. We model the clueless senders according to the i.i.d. Ci for

each period of possible action t:

P (Cluelessi doesn′t send a message) = p

P (Cluelessi sends a message to any receiver) =q

M=

1− p

M

where in keeping with previous papers, q = 1 − p is the probability that Cluelessi

sends a message to any one of the M receivers. When Cluelessi does send a message,

the destination is uniformly distributed over the receivers R1, . . . , RM . We call this the

semi-uniformity assumption. Again, keep in mind that each clueless sender has the

same distribution each t, but they all act independently of each other.

5.3 Capacity Analysis for Exit-MIX Scenario

This chapter presents the capacity analysis for different cases of transmitters and

receivers. Each case is discussed in detail and capacity estimated is compared among

the cases.

The mathematics involved in capacity estimation for this scenario is very compli-

cated. Hence, we estimate the capacity for simple cases and then try to generalize our

observations for N senders and M receivers.

To distinguish the various channel matrices, we will adopt the notation that MN.M

is the channel matrix for N clueless senders and M receivers.

5.3.1 One Receiver (M = 1)

Case 1: No Clueless Senders and One Receiver (N = 0, M = 1). Alice is the

only sender, and there is only one receiver R1. Alice sends either 0 (by not sending

a message) or 1 (by sending a message). Eve receives either e1 = 〈1, 0〉 (Alice did

nothing) or e2 = 〈0, 1〉 (Alice sent a message to the receiver). Since there is no noise

(there are no clueless senders) the channel matrix M0.1 is the 2×2 identity matrix and it

trivially follows that P (E = e1) = x0, and that P (E = e2) = x1.

46

M0.1 =

e1 e2

0 1 0

1 0 1

Since x0 = 1 − x1, we see that1 H(E) = −x0 log x0 − (1 − x0) log(1 − x0). The

channel matrix is an identity matrix, so the conditional probability distribution P (E|A)

is made up of zeroes and ones, therefore H(E|A) is identically zero. Hence, the capacity

is the maximum over x0 of H(E), which is easily seen to be unity2 (and occurs when

x0 = 1/2). Of course, we could have obtained this capacity3 without appealing to

mutual information since we can noiselessly send one bit per tick, but we wish to study

the non-trivial cases and use this as a starting point.

Case 2: N Clueless Senders and One Receiver (M = 1). This case reduces to

the indistinguishable receivers case with N senders analyzed in [13] with both an exit

Mix-firewall that we have been discussing and an entry Mix-firewall (with the receivers

behind the latter). Alice can either send or not send a message, so the input alphabet

again has two symbols. Eve observes N + 2 possible output symbols. That is, Eve sees

e1 = 〈N + 1, 0〉, e2 = 〈N, 1〉, e3 = 〈N − 1, 2〉, · · · , eN+2 = 〈0, N + 1〉. A detailed

discussion of this case can be found in [13].

5.3.2 Some Special Cases for Two Receivers (M = 2)

There are two possible receivers. Alice can signal Eve with an alphabet of three

symbols: 1 or 2, if Alice transmits to R1 or R2, respectively, or the symbol 0 for not

sending a message. Let us analyze the channel matrices and the entropies for different

cases of senders.

1 All logarithms are base 2.

2 The units of capacity are bits per tick t, but we will take the units as being under-stood for the rest of the report. Recall that all symbols take one t to pass through thechannel.

3 This uses Shannon’s [22] asymptotic definition of capacity, which is equivalent fornoiseless channels (in units of bits per symbol).

47

The symbol ej that Eve receives is an 3-tuple of the form 〈aj0, a

j1, a

j2〉, where aj

i is

the number of messages received by ith receiver.4 As before, the index i = 0 relates

to Alice not sending any message. The elements of the 3-tuple must sum to the total

number of senders, N + 1,2∑

i=0

aji = N + 1 .

Case 3: No Clueless Senders and Two Receivers (N = 0, M = 2). Alice is the only

sender and can send messages to two possible receivers. The channel matrix is trivial

and there is no anonymity in the channel.

M0.2 =

〈1, 0, 0〉〈0, 1, 0〉〈0, 0, 1〉

0 1 0 0

1 0 1 0

2 0 0 1

The subscript 0.2 represents one sender (Alice alone) and two receivers. The 3 × 3

channel matrix M0.2[i, j] represents the conditional probability of Eve receiving the

symbol ej, when Alice sends to the receiver Ri (A = i). ‘0’ stands for not sending a

message.

The mutual information I is given by the entropy H(E) describing Eve

I(E, A) = H(E) = −x1 log x1 − x2 log x2 − (1− x1 − x2) log(1− x1 − x2).

The capacity of this noiseless covert channel is log 3 ≈ 1.58 (at xi=1/3, i = 0, 1, 2). For

M = 2 this is the largest capacity, which we note corresponds to zero anonymity. Of

course, this is not surprising since there are no clueless senders.

Case 4: N = 1 Clueless Sender and M = 2 Receivers.

The following row vector describes the probabilities of the possible output symbols

when only one clueless sender is involved.

4 Recall that the aji ’s of the output symbol are not directly related to A, which de-

notes the distribution of Alice.

48

Mix-firewall

R1

Eve

R2

Clueless1

Alice

33hhhhhhhhhhhhhhhhh

++VVVVVVVVVVVVVV 44hhhhhhhhhhhhhhhh

**VVVVVVVVVVVVVVVV

OO

Figure 5–5: Case 4: System with N = 1 Clueless Sender and M = 2 Receivers

( 〈1, 0, 0〉〈0, 1, 0〉〈0, 0, 1〉

p q/2 q/2

)

The message-set matrix given below shows how the various output symbols can be

formed. The rows correspond to Alice’s actions, and the columns, correspond to the

actions of Clueless. Row and column labels are added elementwise to form the matrix

entry, which is the output symbol corresponding to the channel state.

〈1, 0, 0〉〈0, 1, 0〉〈0, 0, 1〉

〈1, 0, 0〉〈2, 0, 0〉〈1, 1, 0〉〈1, 0, 1〉〈0, 1, 0〉〈1, 1, 0〉〈0, 2, 0〉〈0, 1, 1〉〈0, 0, 1〉〈1, 0, 1〉〈0, 1, 1〉〈0, 0, 2〉

The set of distinct symbols formed in the matrix cells constitutes the set of output

symbols Eve may receive. In this case, there are three repetitions in the message-set

matrix, so Eve may receive 9 - 3 = 6 symbols.

Let us consider the channel matrix.

M1.2 =

〈2, 0, 0〉〈1, 1, 0〉〈1, 0, 1〉〈0, 2, 0〉〈0, 1, 1〉〈0, 0, 2〉

0 p q/2 q/2 0 0 0

1 0 p 0 q/2 q/2 0

2 0 0 p 0 q/2 q/2


receiving the symbol ej when Alice sends to Ri. As noted, the dummy receiver R0

49

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

q −−>

capa

city

Figure 5–6: Capacity for N = 1 Clueless Sender and M = 2 Receivers

corresponds to Alice not sending to any receiver (however this is still a transmission to

Eve via the quasi-anonymous channel).

Given the above channel matrix we have:

H(E) = −px0 log[px0]

+[qx0/2 + px1] log[qx0/2 + px1]

+[qx0/2 + px2] log[qx0/2 + px2]

+[qx1/2] log[qx1/2] + [qx1/2 + qx2/2] log[qx1/2 + qx2/2]

+[qx2/2] log[qx2/2].

The conditional entropy is given by

H(E|A) = −2∑

i=0

[p(xi)

6∑j=1

p(ej|xi) log p(ej|xi)

]= h2(p) ,

where h2(p) denotes the function

h2(p) = − (1− p)/2 log((1− p)/2)− (1− p)/2 log((1− p)/2)− p log p

= −(1− p) log((1− p)/2)− p log p .

The mutual information between Alice and Eve is given by

50

Mix-firewall

R1

Eve

R2

Clueless1

Alice

Clueless2

//++VVVVVVVVVVVVVV

33hhhhhhhhhhhhhh

44hhhhhhhhhhhhhhhh

**VVVVVVVVVVVVVVVV

OO

Figure 5–7: Case 5: System with N = 2 Clueless Senders and M = 2 Receivers

I(A,E) = H(E)−H(E|A) ,

and the channel capacity is given by

C = maxA

I(A,E)

= maxx1,x2

−px0 log[px0]

+[qx0/2+px1] log[qx0/2+px1]

+[qx0/2+px2] log[qx0/2+px2]

+[qx1/2] log[qx1/2]+[qx1/2+qx2/2] log[qx1/2+qx2/2]

+[qx2/2] log[qx2/2]−h2(p).

Note that the maximization is over x1 and x2, since x0 is determined by these

two probabilities (holds for any N). This equation is very difficult to solve analytically

and requires numerical techniques. Figure 5–6 shows the capacity for this case with

the curve N = 1. From the plot the minimum capacity is approximately 0.92, when

p = 1/3. This is less than 1.58, which is the corresponding value for N = 0 case. We

will come back to this curve later for comparison purposes with other values of N .

Case 5: N = 2 Clueless Senders and M = 2 Receivers.

The row vector describing the output symbols and their probabilities with only the

two clueless senders only is given by

( 〈2, 0, 0〉〈1, 1, 0〉〈1, 0, 1〉〈0, 2, 0〉〈0, 1, 1〉〈0, 0, 2〉

p2 pq pq q2/4 q2/2 q2/4

).

51

The symbol 〈2, 0, 0〉 has probability p2 because both clueless do not send a message.

The symbol 〈1, 1, 0〉 has probability 2p(q/2) because either Clueless1 does not send a

message and Clueless2 sends a message to R1 or visa versa. The other values behave

similarly. The message set matrix, which has the contributions from the clueless as the

column index and the contributions from Alice as the row index, is as follows.

〈2, 0, 0〉〈1, 1, 0〉〈1, 0, 1〉〈0, 2, 0〉〈0, 1, 1〉〈0, 0, 2〉

〈1, 0, 0〉〈3, 0, 0〉〈2, 1, 0〉〈2, 0, 1〉〈1, 2, 0〉〈1, 1, 1〉〈1, 0, 2〉〈0, 1, 0〉〈2, 1, 0〉〈1, 2, 0〉〈1, 1, 1〉〈0, 3, 0〉〈0, 2, 1〉〈0, 1, 2〉〈0, 0, 1〉〈2, 0, 1〉〈1, 1, 1〉〈1, 0, 2〉〈0, 2, 1〉〈0, 1, 2〉〈0, 0, 3〉

By inspection of the matrix, we notice that the output symbols with more rep-

etitions will have higher probability of being seen by Eve, when compared to others.

That is, output symbol 〈1, 1, 1〉 will have a greater probability of being observed than

〈3, 0, 0〉 or 〈0, 3, 0〉.The probability of observing a symbol also depends on the proba-

bility distribution of the transmitter over the receivers (i.e., the value of q). There are

eight repetitions in the message-set matrix, so the number of total possible symbols Eve

may receive 18 - 8 = 10 symbols. The channel matrix M2.2 is given below.

M2.2 =

〈3, 0, 0〉〈2, 1, 0〉〈2, 0, 1〉〈1, 2, 0〉〈1, 1, 1〉〈1, 0, 2〉〈0, 1, 2〉〈0, 3, 0〉〈0, 2, 1〉〈0, 0, 3〉

0 p2 pq pq q2/4 q2/2 q2/4 0 0 0 0

1 0 p2 0 pq pq 0 q2/4 q2/4 q2/2 0

2 0 0 p2 0 pq pq q2/2 0 q2/4 q2/4


receiving ej when Alice sends a message to receiver Ri.

Figure 5–8 shows the capacity for this case N = 2. Again, the minimum capacity is

found at p = 1/3 = 1/(M + 1). From the plot the minimum capacity is approximately

0.62, when p = 1/3.

5.3.3 Some Special Cases for Three Receivers (M = 3)

Case 6: N = 1 Clueless Senders and M = 3 Receivers. Alice or Clueless can send

to three possible receivers or refrain from sending (denoted by ‘0’). The probabilities of

52

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

q −−>

capa

city

Figure 5–8: Capacity for N = 2 clueless senders and M = 2 receivers

MIX-firewall

R1

R2

Eve

R3

Clueless1

Alice

33hhhhhhhhhhhhhhhh

++VVVVVVVVVVVVVV 44hhhhhhhhhhhhhhhh//

**VVVVVVVVVVVVVVVV

OO

Figure 5–9: Case 6: System with N = 1 Clueless Senders and M = 3 Receivers

the various output symbols from the one clueless sender are given below.

( 〈1, 0, 0, 0〉〈0, 1, 0, 0〉〈0, 0, 1, 0〉〈0, 0, 0, 1〉

p q/3 q/3 q/3

)

Now let us examine the number of possible message set symbols obtained if we

merge the individual message sets of Alice and Clueless.

53

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

q −−>

capa

city

Figure 5–10: Capacity for N = 1 clueless sender and M = 3 receivers

〈1, 0, 0, 0〉〈0, 1, 0, 0〉〈0, 0, 1, 0〉〈0, 0, 0, 1〉

〈1, 0, 0, 0〉〈2, 0, 0, 0〉〈1, 1, 0, 0〉〈1, 0, 1, 0〉〈1, 0, 0, 1〉〈0, 1, 0, 0〉〈1, 1, 0, 0〉〈0, 2, 0, 0〉〈0, 1, 1, 0〉〈0, 1, 0, 1〉〈0, 0, 1, 0〉〈1, 0, 1, 0〉〈0, 1, 1, 0〉〈0, 0, 2, 0〉〈0, 0, 1, 1〉〈0, 0, 0, 1〉〈1, 0, 0, 1〉〈0, 1, 0, 1〉〈0, 0, 1, 1〉〈0, 0, 0, 2〉

As we can see from the above message-matrix, there are six repetitions in the

message sets formed, so Eve may receive 10 different symbols.

The channel matrix M1.3is given below.

〈2, 0, 0, 0〉〈1, 1, 0, 0〉〈1, 0, 1, 0〉〈1, 0, 0, 1〉〈0, 2, 0, 0〉〈0, 1, 1, 0〉〈0, 1, 0, 1〉〈0, 0, 2, 0〉〈0, 0, 1, 1〉〈0, 0, 0, 2〉

0 p q/3 q/3 q/3 0 0 0 0 0 0

1 0 p 0 0 q/3 q/3 q/3 0 0 0

2 0 0 p 0 0 q/3 0 q/3 q/3 0

3 0 0 0 p 0 0 q/3 0 q/3 q/3


receiving ej when Alice sends a message to receiver Ri.

54

Figure 5–10 shows the capacity for this case of N = 1. The minimum capacity is

found at p = 1/4 = 1/(M + 1). From the plot the minimum capacity is approximately

1.25, when p = 1/4.

Case 7: N = 2 Clueless Senders and M = 3 Receivers.

The row vector describing how the clueless users influence the output symbols is

given below.

(〈2, 0, 0, 0〉〈1, 1, 0, 0〉〈1, 0, 1, 0〉〈1, 0, 0, 1〉〈0, 2, 0, 0〉〈0, 1, 1, 0〉〈0, 1, 0, 1〉〈0, 0, 2, 0〉〈0, 0, 1, 1〉〈0, 0, 0, 2〉

p2 2pq/3 2pq/3 2pq/3 q2/9 2q2/9 2q2/9 q2/9 2q2/9 q2/9)

Now let us examine the size of the set of output symbols obtained if we merge the

individual message sets of Alice and the two clueless senders:

〈2, 0, 0, 0〉〈1, 1, 0, 0〉〈1, 0, 1, 0〉〈1, 0, 0, 1〉〈0, 2, 0, 0〉〈0, 1, 1, 0〉〈0, 1, 0, 1〉〈0, 0, 2, 0〉〈0, 0, 1, 1〉〈0, 0, 0, 2〉

〈1, 0, 0, 0〉〈3, 0, 0, 0〉〈2, 1, 0, 0〉〈2, 0, 1, 0〉〈2, 0, 0, 1〉〈1, 2, 0, 0〉〈1, 1, 1, 0〉〈1, 1, 0, 1〉〈1, 0, 2, 0〉〈1, 0, 1, 1〉〈1, 0, 0, 2〉〈0, 1, 0, 0〉〈2, 1, 0, 0〉〈1, 2, 0, 0〉〈1, 1, 1, 0〉〈1, 1, 0, 1〉〈0, 3, 0, 0〉〈0, 2, 1, 0〉〈0, 2, 0, 1〉〈0, 1, 2, 0〉〈0, 1, 1, 1〉〈0, 1, 0, 2〉〈0, 0, 1, 0〉〈2, 0, 1, 0〉〈1, 1, 1, 0〉〈1, 0, 2, 0〉〈1, 0, 1, 1〉〈0, 2, 1, 0〉〈0, 1, 2, 0〉〈0, 1, 1, 1〉〈0, 0, 3, 0〉〈0, 0, 2, 1〉〈0, 0, 1, 2〉〈0, 0, 0, 1〉〈2, 0, 0, 1〉〈1, 1, 0, 1〉〈1, 0, 1, 1〉〈1, 0, 0, 2〉〈0, 2, 0, 1〉〈0, 1, 1, 1〉〈0, 1, 0, 2〉〈0, 0, 2, 1〉〈0, 0, 1, 2〉〈0, 0, 0, 3〉

As we can see, there are 20 repetitions in the symbols formed. Hence, the total

symbols seen by Eve become = 40 - 20 = 20 symbols. If we look through the columns

〈1, 1, 0, 0〉, 〈0, 1, 1, 0〉 and 〈1, 0, 1, 0〉, we can find the element 〈1, 1, 1, 0〉 common to

all the three columns. There are two more similar cases for a common element in

three columns. From this, we conclude that the message sets with even distribution of

messages seem to have a single element common to many of the them, whereas those

with skewed distribution seem to be unique. This is expected, as the ways to distribute

over several receivers is multiple, while there is only one way for all senders to send to

the same receiver.

The channel matrix (split into two) is given below.

〈3, 0, 0, 0〉〈2, 1, 0, 0〉〈2, 0, 1, 0〉〈2, 0, 0, 1〉〈1, 2, 0, 0〉〈1, 0, 2, 0〉〈1, 0, 0, 2〉〈1, 1, 1, 0〉〈1, 1, 0, 1〉〈1, 0, 1, 1〉

0 p2 2pq/3 2pq/3 2pq/3 q2/9 q2/9 q2/9 2q2/9 2q2/9 2q2/9

1 0 p2 0 0 2pq/3 0 0 2pq/3 2pq/3 0

2 0 0 p2 0 0 2pq/3 0 2pq/3 0 2pq/3

3 0 0 0 p2 0 0 2pq/3 0 2pq/3 2pq/3

55

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

q −−>

capa

city

Figure 5–11: Capacity for N = 2 clueless senders and M = 3 receivers

〈0, 3, 0, 0〉〈0, 2, 1, 0〉〈0, 2, 0, 1〉〈0, 1, 2, 0〉〈0, 1, 0, 2〉〈0, 1, 1, 1〉〈0, 0, 3, 0〉〈0, 0, 2, 1〉〈0, 0, 1, 2〉〈0, 0, 0, 3〉

0 0 0 0 0 0 0 0 0 0 0

1 q2/9 2q2/9 2q2/9 q2/9 q2/9 2q2/9 0 0 0 0

2 0 q2/9 0 2q2/9 0 2q2/9 q2/9 2q2/9 q2/9 0

3 0 0 q2/9 0 2q2/9 2q2/9 0 q2/9 2q2/9 q2/9


receiving ej when Alice sends a message to receiver Ri. The generalized formula for the

matrix elements is given by

m(0, j) =

2

(aj0−1)!aj

1!aj2!aj

3!p(aj

0−1)(q/3)3−aj0 for aj

0 = 1, 2, 3

0 for aj0 = 0

m(1, j) =

2

aj0!(aj

1−1)!aj2!aj

3!paj

0(q/3)2−aj0 for aj

1 = 1, 2, 3

0 for aj1 = 0

m(2, j) =

2

aj0!aj

1!(aj2−1)!aj

3!paj


2 = 1, 2, 3

0 for aj2 = 0

56

MIX-firewall

R1

R2

Eve

R3

Clueless1

Alice

Clueless2

//++VVVVVVVVVVVVVV

33hhhhhhhhhhhhhh

44hhhhhhhhhhhhhhhh//

**VVVVVVVVVVVVVVVV

OO

Figure 5–12: Case 7: System With N = 2 Clueless Senders and M = 3 Receivers

MIX-firewall

R1

R2···

Eve

Ri···RM

Clueless

Alice

--ZZZZZZZZZZZZZZZ

11dddddddddddddddd

44hhhhhhhhhhhhhhhh11dddddddddddddddd

..]]]]]]]]]]]]]]]

**VVVVVVVVVVVVVVV

OO

Figure 5–13: Case 8: System with N = 1 Clueless Sender and M Receivers

m(3, j) =

2

aj0!.aj

1!aj2!(aj

3−1)!paj


3 = 1, 2, 3

0 for aj3 = 0

Figure 5–11 shows the capacity for this case in the curve when N = 2. The

minimum capacity is found at p = 1/4 = 1/(M + 1). From the plot the minimum

capacity is approximately 0.89, when p = 1/4, which is less than the lowest capacity for

the N = 1 case.

5.3.4 Some Generalized Cases of N and M

Case 8: N = 1 Clueless and M Receivers.

We generalize the scenario to one clueless transmitter and M receivers. The probability

describing the actions of only the one clueless sender is given below.

(〈1, 0, 0, 0, . . . , 0〉〈0, 1, 0, 0, . . . , 0〉〈0, 0, 1, 0, . . . , 0〉〈0, 0, 0, 1, . . . , 0〉 . . . 〈0, 0, 0, 0, . . . , 1〉

p q/M q/M q/M . . . q/M)

57

The message set matrix is given below.

〈1, 0, 0, 0, . . . , 0〉〈0, 1, 0, 0, . . . , 0〉〈0, 0, 1, 0, . . . , 0〉〈0, 0, 0, 1, . . . , 0〉 . . . 〈0, 0, 0, 0, . . . , 1〉

〈1, 0, 0, 0, . . . , 0〉〈2, 0, 0, 0, . . . , 0〉〈1, 1, 0, 0, . . . , 0〉〈1, 0, 1, 0, . . . , 0〉〈1, 0, 0, 1, . . . , 0〉 . . . 〈1, 0, 0, 0, . . . , 1〉〈0, 1, 0, 0, . . . , 0〉〈1, 1, 0, 0, . . . , 0〉〈0, 2, 0, 0, . . . , 0〉〈0, 1, 1, 0, . . . , 0〉〈0, 1, 0, 1, . . . , 0〉 . . . 〈0, 1, 0, 0, . . . , 1〉〈0, 0, 1, 0, . . . , 0〉〈1, 0, 1, 0, . . . , 0〉〈0, 1, 1, 0, . . . , 0〉〈0, 0, 2, 0, . . . , 0〉〈0, 0, 1, 1, . . . , 0〉 . . . 〈0, 0, 1, 0, . . . , 1〉〈0, 0, 0, 1, . . . , 0〉〈1, 0, 0, 1, . . . , 0〉〈0, 1, 0, 1, . . . , 0〉〈0, 0, 1, 1, . . . , 0〉〈0, 0, 0, 2, . . . , 0〉 . . . 〈0, 0, 0, 1, . . . , 1〉...

.

.....

.

.....

. . ....

〈0, 0, 0, 0, . . . , 1〉〈1, 0, 0, 0, . . . , 1〉〈0, 1, 0, 0, . . . , 1〉〈0, 0, 1, 0, . . . , 1〉〈0, 0, 0, 1, . . . , 1〉 . . . 〈0, 0, 0, 0, . . . , 2〉

The number of output symbols that may be seen by Eve is identical to the total

possible distinct pairs in the message-set matrix shown above. There are two indistin-

guishable transmissions (including null transmissions) and they are sent into M + 1

distinct receivers (urns) (this also includes the null transmission, which by convention

goes to R0, not shown in the figure). Combinatorics tells us then that there are(

M+22

)

distinct combinations (symbols) that Eve may receive.

The channel matrix is given below.

〈2, 0, 0, 0, . . . , 0〉〈1, 1, 0, 0, . . . , 0〉〈1, 0, 1, 0, . . . , 0〉 . . . 〈1, 0, 0, 0, . . . , 1〉〈0, 2, 0, 0, . . . , 0〉 . . . 〈0, 0, 0, 0, . . . , 2〉

0 p q/M q/M . . . q/M 0 . . . 0

1 0 p 0 . . . 0 q/M . . . 0

2 0 0 p . . . 0 0 . . . 0

3 0 0 0 . . . 0 0 . . . 0

.

.

.

.

.

.

.

.

.

.

.

.. . .

.

.

.

.

.

.. . .

.

.

.

M 0 0 0 . . . p 0 . . . q/M

The (M+1)×(M+2

2

)channel matrix M1.M [i, j] represents the conditional probability

of Eve receiving ej when Alice sends a message to receiver Ri.

The probability distribution among the elements of the channel matrix can be

calculated by the formula below.

mi,j =

paj0(q/M)N−aj

0 : aji 6= 0 ∀ i = 1, 2, 3, · · · ,M and j = 1, 2, 3, · · · ,

(M+2

2

)

0 : aji = 0

m0,j =

p(aj0−1)(q/M)N−aj

0+1 : aj0 6= 0 ∀j = 0, 1, 2, · · · ,

(M+2

2

)

0 : aj0 = 0

The conclusions and more generalizations related to this case are discussed in the

results section.

58

Case 9: N Clueless Senders and M = 2 Receivers. In this case, we generalize

the problem to N clueless transmitters for the two receivers case. The total number

of message set symbols seen by Eve, if only the clueless are transmitting, can be

calculated as the number of combinations in which N transmitters can send (or not

send) a message times the number of combinations in which the messages sent can be

distributed into two receivers.

If k out of N transmitters send a message, then the k messages sent can be divided

into two receivers in k + 1 possible combinations ((k, 0), (k − 1, 1), . . . , (0, k)).

message set size = 1 + 2 + 3 + 4 + · · ·+ (N + 2)

=N+2∑i=0

i

= (N + 2)(N + 3)/2

The probability of each channel state with clueless only is as follows.

(〈N, 0, 0〉〈N − 1, 1, 0〉〈N − 1, 0, 1〉〈N − 2, 2, 0〉〈N − 2, 1, 1〉〈N − 2, 0, 2〉 . . . 〈0, 0, N〉

pN NpN−1p/2 NpN−1q/2 N(N − 1)pN−2q2/8 N(N − 1)pN−2q2/4 N(N − 1)pN−2q2/8 . . . (q/2)N)

Now let us merge the individual message sets of Alice and the N clueless transmit-

ters to determine the number of symbols received by Eve.

〈N, 0, 0〉〈N − 1, 1, 0〉〈N − 1, 0, 1〉〈N − 2, 2, 0〉〈N − 2, 1, 1〉〈N − 2, 0, 2〉 . . . 〈0, 0, N〉

〈1, 0, 0〉〈N + 1, 0, 0〉〈N, 1, 0〉〈N, 0, 1〉〈N − 1, 2, 0〉〈N − 1, 1, 1〉〈N − 1, 0, 2〉 . . . 〈1, 0, N〉〈0, 1, 0〉〈N, 1, 0〉〈N − 1, 2, 0〉〈N − 1, 1, 1〉〈N − 2, 3, 0〉〈N − 2, 2, 1〉〈N − 2, 1, 2〉 . . . 〈0, 1, N〉〈0, 0, 1〉〈N, 0, 1〉〈N − 1, 1, 1〉〈N − 1, 0, 2〉〈N − 2, 2, 1〉〈N − 2, 1, 2〉〈N − 2, 0, 3〉 . . . 〈0, 0, N + 1〉

As observed before, the message set 〈N/3 + 1, N/3, N/3〉 is the most uniform

message distribution.

Hence, it has maximum number of repetitions in the message set matrix and will

have a greater probability of being observed than 〈N + 1, 0, 0〉 or 〈0, 1, N〉 .

The channel matrix MN,2 is given below.

〈N + 1, 0, 0〉〈N, 1, 0〉〈N, 0, 1〉〈N − 1, 2, 0〉〈N − 1, 1, 1〉〈N − 1, 0, 2〉 . . . 〈0, 0, N + 1〉

0 pN NpN−1q/2 NpN−1q/2 N(N − 1)pN−2q2/8 N(N − 1)pN−2q2/4 N(N − 1)pN−2q2/8 . . . 0

1 0 pN 0 NpN−1q/2 NpN−1q/2 0 . . . 0

2 0 0 pN 0 NpN−1q/2 NpN−1q/2 . . . (q/2)N

59

MIX-firewall

R1

Eve

R2

Clueless1

Clueless2··Alice··

Cluelessi···CluelessN

++VVVVVVVVVVVVVV

--ZZZZZZZZZZZZZ

//11dddddddddddddd 33hhhhhhhhhhhhh

33ffffffffffffffff

++XXXXXXXXXXXXXXXX

OO

Figure 5–14: Case 9: System with N Clueless Senders and M = 2 Receivers

The 3 × ((N + 2)(N + 3)/2) channel matrix MN.2[i, j] represents the conditional

probability of Eve receiving ej when Alice sends a message to receiver Ri.

The probability distribution in the channel matrix can be imagined as nesting

of two binomial distributions: First, between messages sent and received; second, the

distribution of messages sent to the two receivers. So, given the vector 〈aj0, a

j1, a

j2〉, the

element of the channel matrix can be generalized by the formula below.

m0,j =

(N

aj0 − 1

)p(aj

0−1)(prob. distribution of (N − (aj0 − 1)) messages to R1 and R2)

=

(N

aj0 − 1

)p(aj

0−1)

(N − (aj

0 − 1)

aj1

)(q/2)aj

1 .(q/2)aj2

=

(N

aj0 − 1

)p(aj

0−1)

(N − (aj

0 − 1)

aj1

)(q/2)N−(aj

0−1)

m1,j =

(N

aj0

)paj

0

(N − aj

0

aj1 − 1

)(q/2)N−aj

0

m2,j =

(N

aj0

)paj

0

(N − aj

0

aj1

)(q/2)N−aj

0

Note that aj2 does not explicitly appear but is implicitly in the above since (aj

0 + aj1 +

aj2) − 1 = N , this relationship will be seen to be important in the following general

case (where we use a generalized combinatorial formula). The conclusions and more

generalizations related to this case are discussed in the results section.

60

Case 10: N Clueless Senders and M Receivers. We now generalize the problem

to N clueless senders and M receivers (refer again to Figure 5–4). There are N + 1

indistinguishable transmissions (including null transmissions) and they are sent into

M + 1 distinct receivers (urns) (this also includes the null transmission, which by

convention goes to R0, not shown in the figure). Combinatorics tells us then that there

are K =(

N+M+1N+1

)possible symbols ej.

The rows of our channel matrix correspond to the actions of Alice. The ith row of

MN.M describes the conditional probabilities p(ej|xi) (For simplicity we will not always

explicitly note that j = 1, . . . ,(

N+M+1N+1

).). By convention e1 always corresponds to

every sender not sending a message (which is equivalent to all senders sending to R0).

Therefore e1 is the M + 1 tuple 〈N + 1, 0, . . . , 0〉. Given our simplifying semi-uniformity

assumption for the clueless senders’ distribution, this term must be handled differently.

The first row of the channel matrix is made up of the terms MN.M [0,j]. Here, Alice

is not sending any message (i.e., she is “sending” to R0), so Alice contributes one to

the term aj0 in the M + 1 tuple 〈aj

0, aj1, a

j2, . . . , a

jM〉 associated with ej. In fact, this

tuple is the “long hand” representation of ej. Therefore the contributions to the M + 1

tuple 〈aj0 − 1, aj

1, aj2, . . . , a

jM〉 describe what the N clueless senders are doing. That is,

aj0 − 1 clueless senders are not sending a message, aj

1 clueless senders are sending to

R1, etc. Hence, the multinomial coefficient(

Naj0−1,aj

1,...,ajM

)tells us how many ways this

may occur.5 For each such occurrence we see that the transmissions to R0 affect the

probability by paj0−1, and the transmissions to Ri, i > 0, due to the semi-uniformity

assumption, contribute (q/M)aji . Since the actions are independent, the probabilities

multiply, and since aj0 − 1 + aj

1 + · · · + ajM = N , we have a probability term of

paj0−1(q/M)N+1−aj

0 . Multiplying that term by the total number of ways of arriving at

that arrangement we have that:

MN.M [0, j] =(

Naj0−1,aj

1,...,ajM

)paj

0−1(q/M)N+1−aj0 .

5 The multinomial coefficient is taken to be zero, if any of the “bottom” entries arenegative.

61

The other rows of the channel matrix are MN.M [i, j], i > 0. For row i > 0, we have

a combinatorial term(

Naj0,aj

1,...,aji−1,aj

i−1,aji+1,...,aj

M

)for the N clueless senders, aj

0 of which

are sending to R0 and N − aj0 of which are sending to the Ri, i > 0. Therefore, we see

that under the uniformity assumption,

MN.M [i, j] =(

Naj0,aj

1,...,aji−1,aj

i−1,aji+1,...,aj

M

)paj

0(q/M)N−aj0 , i > 0 .

We show the plots of the mutual information when the clueless senders act (as

assumed throughout the report) in a semi-uniform manner and when Alice also sends in

a semi-uniform manner (i.e., xi = (1 − x0)/M, i = 1, 2, ..., M). We conjecture based

upon our intuition, but do not prove, that Alice having a semi-uniform distribution

of destinations R1, ..., RM when the clueless senders act in a semi-uniform manner

maximizes mutual information (achieves capacity). This has been supported by all of

our numeric computations for capacity. With this conjecture, we can reduce the degrees

of freedom for Alice from M to 1 (her distribution A is described entirely by x0), which

allows greater experimental and analytical exploration.

The channel matrix greatly simplifies when both the clueless senders and Alice act

in a totally uniform manner. That is, when x0 = 1/(M + 1), then xi = (1 − x0)/M =

1/(M + 1) for all xi, and p = 1/(M + 1). We have

MN.M [0, j] =

(N

aj0 − 1, aj

1, . . . , ajM

)paj

0−1(q/M)N+1−aj0 ,

which simplifies to

MN.M [0, j] =

(N

aj0 − 1, aj

1, . . . , ajM

)(

1

M + 1)N .

(Note this form for i = 0 is due to the total uniformity of the Cis.). We also have

MN.M [i, j] =

(N

aj0, a

j1, . . . , a

ji−1, a

ji − 1, aj

i+1, . . . , ajM

)paj

0(q/M)N−aj0 , i > 0 ,

which simplifies to

MN.M [i, j] =

(N

aj0, a

j1, . . . , a

ji−1, a

ji − 1, aj

i+1, . . . , ajM

)(

1

M + 1)N , i > 0 .

Table 1. Lower capacity bounds for N = 0, . . . , 9, and M = 1, . . . , 10

62

M → 1 2 3 4 5 6 7 8 9 10

N ↓0 0.3113 1.5849 2.0000 2.3219 2.5850 2.8074 3.0000 3.1699 3.2192 3.4594

1 0.2193 0.9172 1.2500 1.5219 1.7515 1.9502 2.1250 2.2811 2.4219 2.5503

2 0.1675 0.6204 0.8891 1.1204 1.3218 1.4996 1.6586 1.8021 1.9328 2.0529

3 0.1351 0.4555 0.6760 0.8423 1.0515 1.2112 1.3560 1.4882 1.6097 1.7221

4 0.1133 0.3537 0.5371 0.7080 0.8649 1.0090 1.1410 1.2630 1.3761 1.4813

5 0.0976 0.2864 0.4408 0.5893 0.7288 0.8588 0.9798 1.0925 1.1978 1.2965

6 0.0857 0.2392 0.3710 0.5010 0.6255 0.7434 0.8544 0.9587 1.0570 1.1496

7 0.0765 0.2048 0.3187 0.4334 0.5450 0.6522 0.7542 0.8510 0.9428 1.0298

8 0.0691 0.1789 0.2785 0.3803 0.4809 0.5786 0.6726 0.7626 0.8484 0.9303

9 0.0630 0.1587 0.2467 0.3377 0.4288 0.5183 0.6051 0.6888 0.7692 0.8463

To determine the distribution E describing Eve we need to sum over the columns

of the channel matrix and use the total uniformity of A.

P (E = ej) =∑

i

P (E = ej|A = i)P (A = i) , i = 0, . . . , M .

This gives us

P (E = ej) = (1

M + 1)N

M∑i=0

(N

aj0, . . . , a

ji−1, a

ji − 1, aj

i+1, . . . , ajM

)= (

1

M + 1)N

(N + 1

aj0, . . . , a

jM

).

From this we can compute the entropy H(E) without too much trouble:

H(E) = (1

M + 1)N

∑j

(N + 1

aj0, . . . , a

jM

)(N log(M + 1)− log

(N + 1

aj0, . . . , a

jM

)).

However, the conditional entropy is more complicated, but is expressible. Therefore, we

wrote Matlab code to calculate the mutual information, which is conjectured to achieve

capacity, when both the clueless senders act in a semi-uniform manner and Alice acts

in a totally uniform manner. Local exploration of nearby points all yield lower mutual

information values.

Table 1 tabulates the results of numerical calculations of capacities for different

combinations of values of N and M using Matlab. We conjecture that when Alice acts

in a totally uniform manner (that is every Alice probability is 1/(M + 1)) that capacity

is achieved when the p values are the same, and this capacity is the lower bound for all

capacities. The table gives capacity with p fixed at 1/(M + 1), which we determined

numerically to be less than the capacity for other values of p.

63

5.3.5 Non-Uniform Message Distributions

Each of the Senders (including Alice) can have different message distributions

among the receivers. We consider 80/20 and the more practical “Zipf” distributions and

explain each of them with respect to our scenario.

Zipf distribution. Zipfs distribution refers to the distribution of occurrence of

an relative to its rank ’r’. There are two Zipfs laws: the rank-frequency one and the

frequency count one. According to the rank-frequency law, the frequency of the rth

largest occurrence of the event is inversely proportional to its rank:

fr ∝ 1/rθ

This is typically referred to as Zipf’s law or Zipf distribution. The rank-frequency

plot is a straight line with a slope -θ on a log-log scale.

The second law states that the count of events that have a frequency ’f ’ in terms

of ’f ’. It is defined as

cf ∝ 1/fφ

We can easily prove that the second law is a mathematical consequence of the first

one. It can also be shown that φ = 1 + 1/θ.

We now calculate the message distribution probabilities in Zipf distribution for

One Clueless transmitter (N = 1) and five receivers (M = 5) case. The probability

distribution is given by:

P (clueless send to R1) = c.1/1





P (clueless doesn′t send a message) = 1− p

= q

The constant c is given by 60p/137 and the new probabilities for sending to various

receivers is 60p/137,30p/137, 20p/137, 15p/137, and 12p/137.

64

80/20 distribution. According to this distribution, 80% of the messages are sent to

20% of the recipients and the remaining 20% to 80% of the recipients. Let us assume,

without loss of generality, that the first M/5 receivers get 80% of the messages and the

remaining receivers get the other 20% of the messages. The probability distribution of a

Clueless transmitter is as follows:

P (cluelesssendtoRi∀i = 1, 2, , M/5) =p ∗ 4/5

M/5

=4p

M

P (cluelesssendtoRi∀i = M/5 + 1, , M) =p ∗ 1/5

4M/5

=p

4M

P (clueless doesn′t send a message) = 1− p

= q

For the probability distribution of Alice, there are three different probabilities: Firstly

for not sending a message, secondly for sending to first M/5 messages and the last one

for the remaining 4M/5 receivers.

5.4 Summary

This chapter presents the capacity analysis of the covert channel scenario. Since

the mathematics involved in the analysis is very complex, may simple cases are an-

alyzed. These include many cases involving combinations of N = 1,2,3,4 additional

transmitters and M = 1,2,3 receivers. Based on the observations from the different

cases, the channel matrix and the entropy for generalized case is discussed.

Finally, Zipf and 80/20 message distributions are considered for Alice and Clueless

Transmitters. The results of the calculations presented and generalizations of the

results are presented in the next chapter.

CHAPTER 6DISCUSSION OF RESULTS

6.1 Capacity vs. Clueless Transmitters

Figure 6–1 shows the capacity as a function of p with M = 2 receivers, for

N = 1, 2, 3, 4 clueless senders. In all cases, the minimum capacity is realized at p = 1/3,

and the capacity at p = 1 is log 3. As N increases, the capacity decreases, with the

most marked effects at p = 1/3.

In Figure 6–1, the capacity (of course under the semi-uniformity assumption for Ci

which is in force throughout the report)) was determined numerically for any choice of

A. However, for the remaining plots, we applied the semi-uniformity conjecture (that

Alice is better off behaving semi-uniformly if that is what the clueless senders do).

Thus, x0 is the only free variable for Alice’s distribution in what follows.

6.2 Capacity vs. Number of Receivers

Figure 6–2 shows the capacity as a function of p with M = 3 receivers, for

N = 1, 2, 4 clueless senders. As expected, in all cases, the minimum capacity is realized

at p = 1/4, and the capacity at p = 1 is log 4 = 2. As N increases, the capacity

decreases, with the most marked effects at p = 1/4. The minimum capacity is greater

when compared to corresponding value in the M = 2 case (refer to plot 6–1).

The mutual information as a function of x0 is shown in Figure 6–3 for M = 2

receivers and N = 1 clueless sender for p = 0.25, 0.33, 0.5, 0.67. Here, note that the

curve with p = 0.33 has the smallest maximum value (capacity), and that the value

of x0 at which that maximum occurs is x0 = 0.33. The x0 value that maximizes the

mutual information (i.e., for which capacity is reached) for the other curves is not 0.33,

but the mutual information at x0 = 0.33 is not much less than the capacity for any of

the curves.

Figure 6–4 shows the mutual information curves for various values of x0 as a

function of p, with N = 2 clueless senders and M = 2 receivers. Similarly, Figure 6–5

65

66

0 0.25 0.33 0.5 0.75 0.87 10

0.3

0.6

0.9172

1.2

1.6

p = P(Clueless not sending a message) −−>

Low

er B

ound

of C

apac

ity

N=1

N=2

N=3

N=4

Figure 6–1: Capacity for N = 1 to 4 Clueless Senders and M = 2 Receivers

0 0.25 0.33 0.5 0.75 10

0.4

0.8

1.2

1.6

2.0


Low

er B

ound

of C

apac

ity

N=1

N=2

N=4

Figure 6–2: Capacity for N = 1, 2, 4 Clueless Senders and M = 3 Receivers

67

0 0.25 0.33 0.5 0.75 10

0.3

0.6

0.9

1.2

x0 −−>

Mut

ual I

nfor

mat

ion

p=0.67

p=0.5

↑p=0.33

p=0.25

Figure 6–3: Mutual Information vs. x0 for N = 1 Clueless Sender and M = 2 Receivers,for p = 0.25, 0.33, 0.5, 0.67

0 0.25 0.33 0.5 0.75 10

0.3

0.6

0.9

1.2

1.6

p = (1−q) −−>

Mut

ual I

nfor

mat

ion

x0=0.1

←−−x0=0.25

←−x0=0.33

x0=0.5

x0=0.75

Figure 6–4: Mutual Information vs. p for N = 2 Clueless Senders and M = 2 Receivers

68

0 0.25 0.5 0.75 10

0.5

1.0

1.5

2.0

p = (1−q) −−>

Mut

ual I

nfor

mat

ion

x0=0.10

←−−−−−−x0=0.20 ←−−−−−−−− x0=0.25

←−−−−x0=0.33

x0=0.5

x0=0.75

Figure 6–5: Mutual Information vs. p for N = 2 Clueless Senders and M = 3 Receivers

shows the mutual information curves for various values of x0 as a function of p, with

N = 2 clueless senders and M = 3 receivers.

In the figure 6–4, note that the curve for x0 = 1/(M + 1) = 1/3 has the largest

minimum mutual information, and also has the greatest mutual information at the

point where p = 1, i.e., when there is no noise since Clueless1 is not sending any

messages. The capacity for various values of p is, in essence, the curve that is the

maximum at each p over all of the x0 curves, and the lower bound on capacity occurs at

p = 1/3 = 1/(M + 1).

Also observe that the x0 = 0.33 curve has the highest value for p = .33, but

for other values of p, other values of x0 have higher mutual information (i.e., Alice

has a strategy better than using x0 = 0.33). However, the mutual information when

x0 = 0.33 is never much less than the capacity at any value of p, so in the absence of

information about the behavior of the clueless senders, a good strategy for Alice is to

just use x0 = 1/(M + 1). These observations are illustrated and expanded in the next

two figures. Note the differences in concavity between Figure 6–3 and Figure 6–4 . We

will discuss concavity again later in the report.

Figure 6–6 shows the optimal value for x0, i.e., the one that maximizes mutual

information and hence, achieves channel capacity, for N = 1, 2, 3, 4 clueless senders

and M = 3 receivers as a function of p. A similar graph in [13] for M = 1 receiver is

69

0 0.25 0.5 0.75 10

0.25

0.5

0.75

1

p = P(Clueless not sending a message)

X0*

N=1

←−−−−− N=2←−−−−−−−N=3

↓N=4

Figure 6–6: Value of x0 that Maximizes Mutual Information for N = 1, 2, 3, 4 CluelessSenders and M = 3 Receivers as a Function of p

symmetric about x0 = 0.5, but for M > 1 the symmetry is multidimensional, and the

graph projected to the (p, x0)-plane where the destinations are uniformly distributed

is not symmetric. However, note that the optimum choice of x0 is 1/(M + 1) both at

p = 1/(M + 1) and at p = 1, that is, when the clueless senders either create maximum

noise or when they do not transmit at all (no noise). As N increases, the optimum x0

for other values of p is further from 1/(M +1). Also observe that Alice’s best strategy is

to do the opposite of what the clueless senders do, up to a point. If they are less likely

to send messages (p > 1/(M + 1)), then Alice should be more likely to send messages

(x0 < 1/(M + 1)), whereas if Cluelessi is more likely to send messages ((p < 1/(M + 1)),

then Alice should be less likely to send messages (x0 > 1/(M + 1)).

6.3 Capacity vs. Mutual Information at x0 = 1/(M + 1)

Figure 6–7 shows the degree to which the choice of x0 = 1/(M + 1) can be

suboptimal, for N = 1, 2, 3, 4 clueless senders and M = 3 receivers. The plot shows the

mutual information for the given p and x0 = 1/(M + 1), normalized by dividing by the

capacity (maximum mutual information) at that same p. Hence, it shows the degree to

which a choice of x0 = 1/(M + 1) fails to achieve the maximum mutual information.

For N = 2, it is never worse than 0.94 (numerically), but for N = 4, its minimum

is 0.88. The relationship of suboptimality for other choices of M and N , or for other

distributions, is not known.

70

0 0.25 0.5 0.75 1.0

0.75

0.88

1.0

1.25

1.5

p = P(Clueless not sending a message)

Nor

mal

ized

Mut

ual I

nfor

mat

ion

at x

0 =

1/4

N=1↓

N=3↑

← −−− N=2 N=4

Figure 6–7: Normalized Mutual Information when x0 = 1/4 for N = 1, 2, 3, 4 CluelessSenders and M = 3 Receivers

0 0.25 0.5 0.75 10

0.5

1

1.5

2

2.5


Low

er b

ound

of C

apac

ity

M=1

M=2

M=3

M=4

M=5

← p=0.33

p=0.167→

p=0.2 −→

Figure 6–8: Capacity for N = 1 Clueless Sender and M = 1 to 5 Receivers

71

02

46

82

4

68

10

0

2

4

Receivers, M

Capacity graph

Clueless Transmitters, N −−>

Cap

acity

−−

>

Figure 6–9: Capacity for N = 0 to 9 Clueless Senders and M = 1 to 10.

In Figure 6–8, we show the lower bound on capacity of the channel as a function of

p for N = 1 clueless sender and various values of M receivers. Numerical results show

that this lower bound increases for all p as M increases, and the lower bound on the

capacity for a given M occurs at p = 1/(M + 1), which is indicated by the dotted lines

in the figure.

For Figure 6–9, we take the capacity at p = 1/(M + 1), which we found numerically

to minimize the capacity of the covert channel, and plot this lower bound for capacity

for many values of N and M . We retain the assumption that xi = (1 − x0)/(M + 1)

for i = 1, 2, ..., M , that is, given the semi-uniform distribution of transmissions to the

receivers by the clueless senders, it is best for Alice to do likewise. Along the surface

where N = 0, we have the noiseless channel, and the capacity is log(M + 1), which is

also the upper bound for capacity for all N and M . The values along the surface when

M = 1 give us the same values we derived in [13].

6.4 Capacity vs. Message Distributions

In figure 6–10, we show the lower bound on capacity of the channel for different

message distributions of the Clueless transmitter, Alice following the uniform distribu-

tion. The 80/20 distribution has the highest value of lower bound on capacity, followed

by the zipf and the uniform distributions. Notice that the uniform distribution has

72

0 0.2 0.4 0.6 0.8 10

0.6

1.2

1.8

2.4

p = P(Clueless not sending any message) −−>

Low

er b

ound

of C

apac

ity

80/20

Zipf

Uniform

Figure 6–10: Capacity for Uniform, Zipf, and 80/20 Distributions for Clueless Trans-mitter and Uniform Distribution for Clueless Transmitter

the lowest capacity bound of the three distribution, indicating that the capacity of the

covert channel increases with lesser uniform distributions.

Figure 6–11 shows the mutual information curves, when plotted for various

message distributions followed by Alice, with N = 1 clueless sender and M = 4 receivers

and the clueless sender following uniform distribution. From the curve, we deduce that

Alice has better channel capacity by maintaining the uniform message distribution,

when the clueless transmitter is following uniform distribution.

The figure 6–12 confirms the above fact for the case where Clueless sender follows

zipf distribution. Calculating Capacity for different message distributions get more and

more complicated because of increase in number of variables and more work needs to be

carried out in this area.

6.5 Comments and Generalizations

We first note that the maximum capacity of this (covert) quasi-anonymous channel

is log(M + 1) for M distinguishable receivers, and is achievable only if there are no

other senders (N = 0), or equivalently, if none of them ever send (p = 1), i.e., when the

channel is noiseless.

Here are some of the observations from the different cases considered, under the

semi-uniform assumption for the clueless senders and the semi-uniform conjecture for

Alice, followed by some generalizations.

73

0 0.2 0.4 0.6 0.8 10

0.6

1.2

1.8

2.4

x0 = P(Alice not sending any message) −−>

Mut

ual I

nfor

mat

ion

80/20

Zipf

Uniform

Figure 6–11: Capacity for Uniform, Zipf, and 80/20 Distributions for Alice and Uni-form Distribution for Clueless Transmitter

0 0.2 0.4 0.6 0.8 10

0.6

1.2

1.8

2.4

x0 = P(Alice not sending any message) −−>

Mut

ual I

nfor

mat

ion

80/20

Zipf

Uniform

Figure 6–12: Capacity for Uniform, Zipf, and 80/20 distributions for Alice and ZipfDistribution for Clueless Transmitter

74

• The capacity C(p,N, M), as a function of the probability p that a clueless senderremains silent, with N clueless senders and M receivers, is strictly bounded belowby C( 1

M+1, N,M), and is achieved with x0 = 1/(M + 1).

• The lower bound for capacity for a given number M of receivers decreases as thenumber N of clueless senders increases,C( 1

M+1, N, M) > C( 1

M+1, N + 1,M).

• The lower bound for capacity for a given number N of clueless senders increasesas the number M of distinguishable receivers increases,C( 1

M+2, N, M + 1) > C( 1

M+1, N,M).

These observations are intuitive, but we have not shown them to be true numeri-

cally in the general case (we did for the case that M = 1 in our initial publication [13]).

It is interesting to note that increasing the number of distinguishable receivers increases

the covert channel capacity, which in some sense decreases the (sender) anonymity in

the system (Alice has more room in which to express herself). This is a bit contrary to

the intuitive view of anonymity in Mix networks, where more receivers tends to provide

“greater anonymity.” In this light, we note that Danezis and Serjantov investigated the

effects of multiple receivers in statistical attacks on anonymity networks [?]. They found

that Alice having multiple receivers greatly lowered a statistical attacker’s certainty of

Alice’s receiver set.

While the graphs and numerical tests support that the “worst” thing the clueless

senders can do is to send (or not) with uniform probability distribution over the Ri,

i = 0, 1, 2, ..., M , we have not proven this mathematically. Nor have we proven that,

under these conditions, the best Alice can do is to send (or not) to each receiver Ri

with uniform probability, xi = 1/(M + 1) for i = 0, 1, 2, ..., M , although the numerical

computations support this. The proof in [13] of these conjectures for the case where

M = 1 relied, in part, on the symmetry about x0 = 0.5, which is not the case when

M > 1, so another approach must be used. However, we should still be able to use

the concavity/convexity results from [13]. Note that our conjecture that the best that

Alice can do is to send in a semi-uniform manner, and the results illustrated in Figure

8, seem to be an extension of the interesting results of [10].

6.6 Summary

The capacity C(p,N, M), as a function of the probability p that a clueless sender

remains silent, with N clueless senders and M receivers, is strictly bounded below

75

by C( 1M+1

, N, M), and is achieved with x0 = 1/(M + 1). The the lower bound of

capacity decreases with increase in Clueless senders and increases with increase in

distinguishable receivers. The lower bound for capacity for a given number of receivers

decreases as the number of Clueless senders increases.

CHAPTER 7CONCLUSIONS AND FUTURE WORK

This thesis has taken a step towards tying the notion of capacity of a quasi-

anonymous channel associated with an anonymity network to the amount of anonymity

that the network provides. It explores the particular situation of a simple type of

timed Mix (it fires every tick) that also acts as an exit firewall. Cases for varying

numbers of distinguishable receivers and varying numbers of senders were considered,

resulting in the observations that more senders (not surprisingly) decreases the covert

channel capacity, while more receivers increases it. The latter observation is intuitive

to communication engineers, but may not have occurred to many in the anonymity

community, since the focus there is often on sender anonymity.

As the entropy H of the probability distribution associated with a message

output from a Mix gives the effective size, 2H , of the anonymity set, we wonder if the

capacity of the residual quasi-anonymous channel in an anonymity system provides

some measure of the effective size of the anonymity set for the system as a whole.

That is, using the covert channel capacity as a standard yardstick, can we take the

capacity of the covert channel for the observed transmission characteristics of clueless

senders, equate it with the capacity for a (possibly smaller) set of clueless senders with

maximum entropy (i.e., who introduce the maximum amount of noise into the channel

for Alice), and use the size of this latter set as the effective number of clueless senders

in the system. This is illustrated in Figure 6–1, with the vertical dashed line showing

that N = 4 clueless senders that remain silent with probability p = 0.87 are in some

sense equivalent to one clueless sender that sends with p = 0.33.

The case in which the Mix itself injects dummy messages into the stream randomly

is not distinguishable from having an additional clueless sender. However, if the Mix

predicates its injection of dummy messages upon the activity of the senders, then it can

affect the channel matrix greatly, to the point of eliminating the covert channel entirely.

76

77

We are also interested in the degree to which the Mix can reduce the covert channel

capacity (increase anonymity) with a limited ability to inject dummy messages.

]plain

REFERENCES

[1] Adam Back, Ulf Moller, and Anton Stiglic. Traffic analysis attacks and trade-offsin anonymity providing systems. In Ira S. Moskowitz, editor, Information Hiding,4th International Workshop (IH 2001), pages 245–257. Springer-Verlag, LNCS2137, 2001.

[2] P. Boucher, I. Goldberg, and A. Shostack. Freedom system 2.0 architecture.http://www.freedom.net/info/whitepapers/, December 2000. Zero-KnowledgeSytems, Inc.

[3] David Chaum. Untraceable electronic mail, return addresses and digitalpseudonyms. Communications of the ACM, 24(2):84–88, 1981.

[4] David Chaum. The dining cryptographers problem: Unconditional sender andrecipient untraceability. Journal of Cryptology: the Journal of the InternationalAssociation for Cryptologic Research, 1(1):65–75, 1988.

[5] L. Cottrell. Mixmaster and remailer attacks, August 1994. http://www.obscura.com/

~loki/remailer/remailer-essay.html, August 2004.

[6] Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. Towards measuringanonymity. In Paul Syverson and Roger Dingledine, editors, Privacy EnhancingTechnologies (PET 2002). Springer-Verlag, LNCS 2482, April 2002.

[7] D. Goldschlag, M. Reed, and P. Syverson. Onion routing for anonymous andprivate internet connections. Communications of the ACM (USA), 42(2):39–41,1999.

[8] C. Gulcu and G. Tsudik. Mixing Email with Babel . In Internet Society Symposiumon Network and Distributed Sytem Security (NDSS’96), pages 2–16, San Diego,CA, Feb 1996.

[9] D. Kesdogan, J. Egner, and R. Buschkes. Stop-and-go-MIXes providing probabilis-tic anonymity in an open system. In Proceedings of the International InformationHiding Workshop, April 1998.

[10] E.E. Majani and H. Rumsey. Two results on binary input discrete memorylesschannels. In IEEE International Symposium on Information Theory, page 104,June 1991.

[11] Ulf Moeller and Lance Cottrell. Mixmaster Protocol Version 3, 2000. http:

//www.eskimo.com/~rowdenw/crypt/Mix/draft-moeller-v3-01.txt, August 2004.

[12] Ira S. Moskowitz and Myong H. Kang. Covert channels — here to stay? In Proc.COMPASS’94, pages 235–243, Gaithersburg, MD, June 27- July 1 1994. IEEEPress.

78

http://www.freedom.net/info/whitepapers/

http://www.obscura.com/~loki/remailer/remailer-essay.html

http://www.obscura.com/~loki/remailer/remailer-essay.html

http://www.eskimo.com/~rowdenw/crypt/Mix/draft-moeller-v3-01.txt

http://www.eskimo.com/~rowdenw/crypt/Mix/draft-moeller-v3-01.txt

79

[13] Ira S. Moskowitz, Richard E. Newman, Daniel P. Crepeau, and Allen R. Miller.Covert channels and anonymizing networks. In ACM WPES, pages 79–88,Washington, October 2003.

[14] Ira S. Moskowitz, Richard E. Newman, and Paul F. Syverson. Quasi-anonymouschannels. In IASTED CNIS, pages 126–131, New York, December 2003.

[15] R. E. Newman-Wolfe and B. R. Venkatraman. High level prevention of trafficanalysis. In Proc. IEEE/ACM Seventh Annual Computer Security ApplicationsConference, pages 102–109, San Antonio, TX, Dec 2-6 1991. IEEE CS Press.

[16] R. E. Newman-Wolfe and B. R. Venkatraman. Performance analysis of a methodfor high level prevention of traffic analysis. In Proc. IEEE/ACM Eighth AnnualComputer Security Applications Conference, pages 123–130, San Antonio, TX, Nov30-Dec 4 1992. IEEE CS Press.

[17] Onion routing home page. http://www.onion-router.net, August 2004.

[18] J. Raymond. Traffic analysis: Protocols, attacks, design issues, and open problems.In Hannes Federrath, editor, Designing Privacy Enhancing Technologies: DesignIssues in Anonymity and Observability, pages 10–29. Springer-Verlag, LNCS 2009,July 2000.

[19] Michael K. Reiter and Aviel D. Rubin. Crowds: anonymity for web transactions.ACM Transactions on Information and System Security, 1(1):66–92, 1998.

[20] Andrei Serjantov and George Danezis. Towards an information theoretic metricfor anonymity. In Paul Syverson and Roger Dingledine, editors, Privacy EnhacingTechnologies (PET 2002). Springer-Verlag, LNCS 2482, April 2002.

[21] Andrei Serjantov, Roger Dingledine, and Paul Syverson. From a trickle to a flood:Active attacks on several mix types. In IH 2002, pages 36–52, Noordwijkerhout,the Netherlands, October 2002.

[22] Claude E. Shannon. The mathematical theory of communication. Bell SystemsTechnical Journal, 30:50–64, 1948.

[23] Claude E. Shannon. The zero error capacity of a noisy channel. IRE Trans. onInformation Theory, Vol. IT-2:S8–S19, September 1956.

[24] P F Syverson, D M Goldschlag, and M G Reed. Anonymous connections and onionrouting. In IEEE Symposium on Security and Privacy, pages 44–54, Oakland,California, 4–7 1997.

[25] Paul F. Syverson, Gene Tsudik, Michael G. Reed, and Carl E. Landwehr. Towardsan analysis of onion routing security. In Hannes Federrath, editor, DesigningPrivacy Enhancing Technologies: Design Issues in Anonymity and Observability,pages 96–114. Springer-Verlag, LNCS 2009, July 2000.

[26] B. R. Venkatraman and R. E. Newman-Wolfe. Transmission schedules to preventtraffic analysis. In Proc. IEEE/ACM Ninth Annual Computer Security ApplicationsConference, pages 108–115, Orlando, FL, December 6-10 1993. IEEE CS Press.

http://www.onion-router.net

80

[27] B. R. Venkatraman and R. E. Newman-Wolfe. Performance analysis of a methodfor high level prevention of traffic analysis using measurements from a campusnetwork. In Proc. IEEE/ACM Tenth Annual Computer Security ApplicationsConference, pages 288–297, Orlando, FL, December 5-9 1994. IEEE CS Press.

BIOGRAPHICAL SKETCH

Vipan Reddy Nalla was born on August 1st, 1981, in Nizamabad, Andhra Pradesh,

India. He received his undergraduate degree, Bachelor of Technology, civil engineering,

from Indian Institute of Technology, Chennai( Madras), India, in August 2001.

He joined the University of Florida in Spring 2003 to pursue his master’s degree.

His research interests include Network Security and Cryptography with an emphasis on

anonymity and covert channels.

81

anonymity and covert channels in mix-firewalls

Documents