aaron johnson*, rob jansen, nicholas hopper, aaron segal, and … · 2017. 9. 12. · aaron...

Proceedings on Privacy Enhancing Technologies 2017; 2017 (2):1–21

Aaron Johnson*, Rob Jansen, Nicholas Hopper, Aaron Segal, and Paul SyversonPeerFlow: Secure Load Balancing in TorAbstract: We present PeerFlow, a system to securelyload balance client traffic in Tor. Security in Tor requiresthat no adversary handle too much traffic. However, Torrelays are run by volunteers who cannot be trusted toreport the relay bandwidths, which Tor clients use forload balancing. We show that existing methods to de-termine the bandwidths of Tor relays allow an adversarywith little bandwidth to attack large amounts of clienttraffic. These methods include Tor’s current bandwidth-scanning system, TorFlow, and the peer-measurementsystem EigenSpeed. We present an improved designcalled PeerFlow that uses a peer-measurement processboth to limit an adversary’s ability to increase his mea-sured bandwidth and to improve accuracy. We show oursystem to be secure, fast, and efficient. We implementPeerFlow in Tor and demonstrate its speed and accu-racy in large-scale network simulations.

Keywords: Tor, distributed systems, security

1 IntroductionTor [12] is a popular anonymous-communication net-work. It consists of over 7000 volunteer relays carryingthe traffic of over 2 million users daily at over 60Gib/son average [4]. To balance this large traffic load overa diverse relay population, Tor estimates relay band-width using both self measurements and external mea-surements and then directs clients to use relays in pro-portion to the relays’ bandwidth estimates. This load-balancing system is an attractive target for attack be-cause the relays that carry a client’s connection are ableto learn sensitive properties about that connection, suchas the client’s identity. If an adversary controls enoughof those relays, he can deanonymize the connection [24].

*Corresponding Author: Aaron Johnson: U.S. NavalResearch Laboratory, E-mail: [email protected] Jansen: U.S. Naval Research Laboratory, E-mail:[email protected] Hopper: University of Minnesota, E-mail: [email protected] Segal: Yale University, E-mail: [email protected] Syverson: U.S. Naval Research Laboratory, E-mail:[email protected]

Tor designed its current relay-measurement system,TorFlow [5, 26], to avoid relying entirely on self-reportedmeasurements [7] and thereby improve performance andsecurity. TorFlow implements “bandwidth scanning”,in which measurement authorities create connectionsthrough relays to measure their bandwidth. Researchershave observed [8, 31] that in this system a malicious re-lay can increase its apparent bandwidth. We confirmthis observation by implementing and experimentallytesting the attacks. Our results show that an adver-sary can obtain 177 times more client traffic than heshould. An adversary with 1% of the network band-width could have 64% of client connections directed tohim, giving him the opportunity to attack a large major-ity of Tor users. Moreover, the adversary can effectivelyattack this large number of connections even with hissmall amount of bandwidth, for example attempting todeanonymize connections using a correlation attack thatcompletes before application data is sent [7] and thenimmediately shutting the TCP window.

The main alternative to TorFlow is the EigenSpeedsystem of Snader and Borisov [27–29]. In EigenSpeed,each Tor relay measures the speeds of its connections toother Tor relays and reports them to an authority, whoapplies Principal Component Analysis (PCA) to pro-duce bandwidth estimates. We show that EigenSpeedtoo is highly vulnerable to attack. We identify basicflaws in its measurement method and initialization pro-cess, and we describe and experimentally demonstratefundamental flaws in using PCA to aggregate measure-ments. These flaws allow an adversary to either get mosthonest relays kicked out of the network or receive up to420 times more client traffic than he should.

We present PeerFlow, a load-balancing scheme forTor that prevents an adversary from being directed ashare of client traffic that is much larger than his shareof the network capacity. PeerFlow uses a bandwidth-weighted voting process that resists manipulation bylow-bandwidth adversaries, and it can use measure-ments from trusted relays for improved security. Peer-Flow solves many of the additional challenges to cre-ating a complete replacement for TorFlow, including asecure process to bootstrap new relays and techniquesto ensure user privacy in reported traffic statistics. Wealso prototype PeerFlow in the actual Tor software anduse large-scale network simulations to show that its loadbalancing maintains Tor’s current performance.

PeerFlow: Secure Load Balancing in Tor 2

2 Background and Related WorkTor: Tor [12] anonymizes a client TCP connection byrandomly choosing three relays from its network, creat-ing a multiply-encrypted circuit over those relays, andcreating a stream through that circuit, which causes theendpoint to create an associated TCP connection to thedestination. Streams can be multiplexed over a circuit.

Clients have a small, longstanding set of relays(guards) from which they select the first hop on theircircuits. Tor clients use one guard rotated every 2–3months. To become guards, relays must be old enough,provide at least 250KB/s of bandwidth, and have ade-quate uptime. Relays meeting these criteria receive theGuard flag, which makes them eligible to be selected asguards. Circuits also have a second (middle) hop and athird (exit) hop. Relays must allow connection to theclient’s desired destination port and IP to be selectedas the exit. Most exits receive the Exit flag, given whena relay allows exit to the most useful ports. Currentlyrelays must also receive the Fast flag (which requires abandwidth of 100KB/s) to be selected at all.

A system of Directory Authorities maintains flagand other relay information and publishes an hourlyconsensus, which clients use to select relays for circuits.Clients choose relays for a circuit position randomlywith probability roughly proportional to the productof each relay’s consensus weight, which is proportionalto the relay’s bandwidth and is for load balancing ina given position, and a position weight, which is basedon the relay’s flags and is used to improve load balanc-ing across the different relay positions (namely, guard,middle, and exit). Relays also provide the Directory Au-thorities with a self-determined advertised bandwidth toaid in setting their consensus weights.

Tor security requires that most of the network byconsensus weight is not malicious. An adversary thatcontrols much of the network will be selected often byclients. When selected as a guard, he can apply websitefingerprinting [32, 33] to identify the client’s destination.When selected as guard and exit, he can deanonymizethe connection using a first-last correlation attack [24].And, of course, when selected for all three relays on acircuit, the connection can be trivially deanonymized.Bandwidth Measurement: TorFlow [5, 26] is abandwidth-scanning system currently used by Tor inwhich measuring authorities measure the bandwidthsof Tor relays for the purpose of determining consen-sus weights and flags. Past work has shown that Tor-Flow is vulnerable to multiple attacks [7, 31], which wewill explore in more detail in §3. In contrast, PeerFlow

uses relays’ observations of each other in order to de-termine relay bandwidths. EigenSpeed [27–29] is a pro-posed scheme for secure bandwidth estimation that —like PeerFlow — uses peer measurements. Presented asa scheme for use in the Tor network, it is also proposedmore generally for use in peer-to-peer distribution net-works. EigenSpeed is the state of the art for this problemas far as we are aware, and it has been considered foradoption into Tor.1 We show in §4 that EigenSpeed canbe manipulated through several attacks and does notachieve its security goals.Other: Karame et al. [25] describe attacks on link ca-pacity estimation techniques and suggest using trustednetwork hardware to secure these measurements. Susel-beck et al. [30] propose a system for estimating peerbandwidth in a P2P system. It includes both passivemeasurements and active traffic injection, as PeerFlowdoes, but it assumes all peers are trustworthy, whichPeerFlow does not. Haeberlen et al. propose the Peer-Review system [17], which users cryptographic logs ofnode activity and witness audits of a node’s actions todetect misbehavior in a distributed system. While thesemethods might further enhance the security of Peer-Flow, they are unable to solve major challenges, includ-ing (i) exposing falsely-claimed transfers between ma-licious relays and (ii) identifying trustworthy witnessesin a system where Sybil attacks are possible. Jansenet al. [22, 23] describe how secure bandwidth measure-ments in Tor could be used to build a system to incen-tivize Tor relay operators by rewarding them for trans-ferring traffic.

3 Attacks on TorFlowTorFlow: TorFlow [5, 26] is a tool used by bandwidth-measuring Directory Authorities to directly measure thebandwidths of Tor relays. TorFlow works by construct-ing a series of measurement circuits, using them to per-form test downloads, and then computing a weight foreach relay based on the speeds of the test downloads.

TorFlow selects which relays to measure by divid-ing the list of all relays into slices of 50 relays of similarbandwidth (according to the most recent consensus). Itmeasures each slice by constructing two-hop measure-ment circuits using only relays from that slice. Wheneach circuit is built, TorFlow uses it to download a filefrom torproject.org. TorFlow continues building newcircuits, choosing unmeasured relays from the current

1 https://trac.torproject.org/projects/tor/ticket/5464

https://trac.torproject.org/projects/tor/ticket/5464


slice at random, to measure two relays at a time untilevery relay in the slice under examination has been mea-sured several times. Then, for each relay, TorFlow takesthe average bandwidth measured on circuits involvingthat relay, and stores this measurement to disk.

Every hour, TorFlow aggregates these measure-ments and produces a weight for each relay. A re-lay’s weight is calculated by multiplying the relay’s self-advertised bandwidth by the ratio between its measuredbandwidth and the averaged measured bandwidth overthe entire network. The Directory Authorities use theseweights to produce the consensus weights.Attacks: TorFlow allows a number of attacks that makeit easy to manipulate a relay’s apparent bandwidth.First, malicious relays can perform a liar attack [7],wherein they dishonestly report a higher bandwidththan they have available to increase their chances of be-ing chosen during path selection while expending veryfew resources. TorFlow attempts to address this prob-lem by adjusting self-reported bandwidths by a multi-plier representing relative performance, but by contin-uing to use the reported bandwidth as a baseline, itremains vulnerable to the same attack. For example, arelay providing only 100 KB/s of bandwidth (the mini-mum required to obtain the Fast flag) could advertise abandwidth so high that even after being adjusted downby TorFlow (as explained above), it will only be cappedby 10 MB/s—the upper bound the Directory Authori-ties will assign to any relay. This attack is very effective,but gross exaggeration could be detected.

A more subtle attack takes advantage of TorFlow’stwo-hop measurement circuits, which are built with oneof a small number of authorities on one end and afixed URL on the other. Because the IP addresses ofthese measurement nodes are known, relays can easilyrecognize measurement circuits and treat them differ-ently from ordinary ones, a technique demonstrated byThill [31]. In a selective denial-of-service (DoS) attack,a relay can provide service only to measurement circuitswhile dropping all others, thereby giving the authoritiesthe impression of excess capacity at very low cost. Anadversarial exit can further reduce resource consump-tion by spoofing short responses from the destinationinstead of downloading and serving real ones, since Tor-Flow does not verify certificates or check the correctnessor length of downloads.Results: To demonstrate the efficacy of both the liarand selective DoS attacks, we developed a new TorFlowplug-in for the Shadow [2, 21] discrete-event networksimulator. The plug-in mimics the functionality of thepython scripts [5] that are used to run TorFlow in the

Goodput (MiB/s) Consensus Weight (%)Median Std. Dev. Median Std. Dev.

Baseline 22.5 5.9 7 1Attack 0.2 0.1 11 5

Table 1. A relay can inflate its consensus weight at little cost bylying about its capacity and denying service to all but measure-ment circuits. Our experiments led to a bandwidth inflation factorof 177.

public Tor network. Using Shadow and our new plug-in,we constructed a private Tor network with 498 relays,4 directory authorities, 7500 clients, and 1 bandwidthauthority that runs TorFlow. More details about ourTor model and simulations can be found in §7.

We implemented both attacks outlined above inTor, and compiled the Shadow Tor plug-in with ourmodified Tor code. We arbitrarily chose one exit re-lay that was generated by Shadow’s network generatorto act as our test relay; this relay had an access linkof 300 Mbits/s. We ran three baseline experiments inwhich our test relay acted completely honestly, and weran six attack experiments in which our test relay ranboth the liar and selective DoS attacks by: (i) only for-warding traffic for the TorFlow measurement circuits;and (ii) falsely reporting a bandwidth of 125000 KB tothe directory authorities. We monitored the test relay’sbandwidth usage and the fraction of the total consensusweight that our test relay achieved over time.

The results shown in Table 1 indicate that, usingthe attacks, the exit node was successfully able to in-flate its consensus weight relative to other relays whileat the same time consuming significantly less bandwidth(only what was required for the measurement circuits).Our attacks reduced the median bandwidth consumedby our test relay from 22.5 MiB/s to 0.2 MiB/s, whilethe median consensus weight obtained increased from7% to 11%; our attack enabled the test relay to obtainmore units of consensus weight fraction per bandwidthunit cost with a bandwidth inflation factor of 177. Whilewe used only one relay for demonstration purposes, anadversary could use the same techniques with severalrelays to gain an even larger total fraction of the con-sensus weight.

We note that while TorFlow’s current design makesthese attacks easy to carry out, any bandwidth-scanningapproach will need to solve the problem of relays de-tecting and favoring measurement probes, as observedby Biryukov et al. [8]. A major obstacle is the fact thatsome relays are unlikely to be chosen by normal clientsin the middle circuit position (e.g. guards and exits),


and measurement traffic must behave the same, expos-ing measurement sources or destinations to malicious re-lays and helping them identify the measurement probes.

4 Attacks on EigenSpeedEigenSpeed: EigenSpeed uses as consensus weights theeigenvector of a matrix derived from a bandwidth ma-trix T , where Tij is the bandwidth estimate of relayj by relay i. This process can be viewed as findingweights that are consistent with using themselves ina weighted average of the bandwidth estimates. Tor’sDirectory Authorities are the recipients of each node’sestimates of the others and are responsible for deter-mining the weights. Snader’s thesis [27] is the final andmost complete description of EigenSpeed, and so we useit as the as the authoritative version.

The bandwidth estimates are obtained from peri-odic measurements of the current bandwidth of an indi-vidual stream with a relay, rather than that relay’s en-tire bandwidth (see [27] §3.3.1). Each new measurementis combined with a running bandwidth estimate usingan exponentially-weighted moving average (EWMA).At the end of the measurement period, the estimatesare put into matrix T , which the Directory Authori-ties use to produce T by first setting T ii = 0 to ignoreself-measurements, next setting T ij = min(Tij , Tji) toenforce symmetric measurements, and finally normaliz-ing rows of T to sum to one. The EigenSpeed consensusweights are the left principal eigenvector v∗ of T , whichis produced by iteratively computing vi = vi−1T , wherev0 has a 1/t entry in the positions of a set of t trusted re-lays and a 0 entry elsewhere. Any relay j with measure-ments sufficiently different from the eigenvector, that is,with ‖v∗ − T j‖4 > L for L = 10−5, is considered a liarand is removed and added to a set of unevaluated relays.Let ‖v∗−T j‖4 be the liar metric for j. Similarly, any re-lay j whose weight increased too fast during the first twoiterations, that is, with (v2

j − v1j )/v1

j > ∆ for ∆ = 0.1, isjudged to be malicious and is removed and consideredunevaluated. Let (v2

j −v1j )/v1

j be the increase metric forj. The unevaluated set also includes relays that are notin the largest component of the measurement graph (e.g.new relays), where an edge (i, j) exists in the graph ifT ij > 0. In EigenSpeed, unevaluated relays will each get1/n of the total consensus weight, where n is the totalnumber of relays.Attacks: An obvious vulnerability of EigenSpeed isthat it selects each unevaluated relay with probability1/n. An adversary can flood the network with a largenumber of new relays contributing little or no band-

width and thereby obtain a large total selection proba-bility. Each new relay need only have a unique IP ad-dress. This could, for example, allow a botnet of just20,000 computers to have a collective consensus weightof over 74% in a Tor network of its current size of about7,000 honest relays. We therefore assume that unevalu-ated relays are selected with very low probability (e.g.if there are u unevaluated relays, then the probability ofselecting a given one is 0.01/u). This limits the effect ofa Sybil attack to just taking over the unevaluated set,but it also means that a relay that is unevaluated canbe essentially shut out of the Tor network.

We show how the two mechanisms that EigenSpeeduses to detect malicious nodes can be used to framehonest relays and have them put into the unevaluatedset. The liar threshold L = 10−5 and increase threshold∆ = 0.1 are designed for an attack in which a clique ofmalicious relays report high bandwidth with each otherand low bandwidth with others. However, modificationsto this attack can confuse the trusted relays (which oth-erwise help distinguish between the honest and dishon-est relays) by making the liar or increase metrics forhonest framed relays appear large. This will imply thateither (i) the adversary can get the honest non-trustedrelays effectively kicked out of the network by movingthem to the unevaluated set or (ii) L and ∆ are largeenough that the adversary can greatly inflate the in-ferred bandwidth of his relays.Analysis: We demonstrate framing attacks using theTor 2015-04-31 23:00 network consensus [1], which con-tains 5589 relays with positive selection probability(over 1000 relays have zero selection probability). Eigen-Speed takes measurements of the per-stream bandwidthat a given relay rather than the total bandwidth atthat relay. Thus we consider the ideal load-balancedcase, in which all streams have the same bandwidth.We note that our attacks are even more effective whenthe network is not load balanced, in which case thereis disagreement among the relays’ observations, forc-ing the thresholds L and ∆ to be large. For example,suppose that the relay observations are the total relaybandwidths, which we take to be the “observed band-widths” in the relays’ descriptors [3]. Then, with 10%of relays trusted (starting with the largest by observedbandwidth) and no malicious relays, honest relays havea liar metric as large as 0.00165, much larger than thesuggested threshold of L = 10−5, and honest relays havean increase metric as large as 0.86, much larger than thesuggested ∆ = 0.1. We thus consider all our attacks inthe load-balanced setting, in which case these metricsare much smaller and well below the suggested thresh-


# trusted relays(% of honest)

# advrelays

# framedrelays Adv bw %

280 (5%) 447 1118 1.92559 (10%) 558 1118 2.831118 (20%) 558 559 2.831677 (30%) 558 112 3.00

Table 2. Cases with minimum bandwidth in which all framedrelays had increase metrics above 0.2.

olds. In all of our experiments, we follow Snader [27]and stop the iterative eigenvector calculation when thechange in vector norm is less than 10−10.

The increase framing attack causes the increasemetric of honest relays in a targeted set to be large.In this attack, a set of malicious relays uses sufficientbandwidth to obtain the average true bandwidth ofhonest relays with all trusted relays and with a sub-set of “framed” honest non-trusted relays. The mali-cious relays also falsely claim that bandwidth measure-ment among themselves (i.e. no data is actually sentamong them). All other measurements with maliciousrelays are zero. All measurements among the honest re-lays are the same load-balanced flow rate (say, 1). Notethat each malicious relay can easily obtain any band-width measurement with any honest relay, as long asthe measurement does not exceed the relay’s true band-width, by creating spurious connections to the honestrelay with the necessary amounts of traffic. The mali-cious relay may furthermore drop all connections fromhonest clients for measurements of zero.

We consider adding relays to the Tor network inorder to frame a subset of honest relays by causingtheir increase metric to reach above 0.2. Not only is thisamount is greater than the ∆ = 0.1 recommended, it willbe enough to allow significant bandwidth inflation andthereby demonstrate that no setting for ∆ can providegood protection. For various numbers of trusted relays,we searched for a minimum adversarial BW needed toframe at least 2% of the relays. We count the band-width of a relay as the sum of the estimates obtainedwith each other relay, ignoring the false estimates amongmalicious relays. Table 2 presents our results. It showsthat with at most 3% of the total bandwidth, the ad-versary can frame between 112 (2%) and 1118 (20%)of the honest relays, with the number decreasing as thenumber of honest relays that are trusted increases from280 (5%) to 1677 (30%). The number of relays that theadversary must add is not large, from 447 to 558. More-over, the attack can easily be repeated (with a differ-ent set of malicious relays) in order to move even morehonest relays into the unevaluated set. As we noted,

# trusted relays(% of honest)

# advrelays

False bwfactor

Advbw %

Advweight %

280 (5%) 4191 100 3.49 98.2559 (10%) 2235 100 3.70 93.91118 (20%) 1117 100 3.70 79.51677 (30%) 1397 15 6.52 48.5

Table 3. Cases with maximum weight in which all malicious re-lays had increase metrics below 0.2 and liar metrics below thehonest non-trusted relays.

the selection probability for relays in the unevaluatedset must be quite low. Therefore, they will be rarelyobserved by other relays, and so they must either waitmany measurements periods to be evaluated or will havevery low inferred bandwidths. In addition, if the adver-sary performs another Sybil attack on the unevaluatedpool, which requires IP addresses but no bandwidth,then relays in the unevaluated pool are effectively re-moved entirely. Even worse, the relay bandwidths arehighly skewed, and so even for the smallest number offramed relays in Table 2 (112), the adversary can quicklycause most of the network capacity to be unused. For ex-ample, in the consensus used in our experiments, 50% ofthe total observed bandwidth is provided by the largest464 relays and 75% by the largest 1172.

We next show the targeted liar attack, in which justthe trusted relays are provided a high-bandwidth flow.This targeting of trusted relays keeps the liar metrics ofthe malicious relays low while inflating the malicious re-lays’ weight. In these experiments, the malicious relaysobtain bandwidth estimates with the trusted relays thatare the average bandwidth of the honest non-trusted re-lays, the malicious relays create measurements of zerowith the honest non-trusted relays, and the malicious re-lays report a large false bandwidth among themselves.All honest relays again have the same bandwidth mea-surement among themselves.

Thus the trusted relays make the same measure-ments with the other honest relays as with the maliciousrelays, but the measurements of the honest non-trustedrelays and those of the malicious relays disagree. Thismakes the attack use less bandwidth while keeping theliar metrics of malicious relays close to those of honestnon-trusted relays. Table 3 lists the scenarios in whichthe final adversary weight was maximized among thosewe tested subject to (i) preferring adversary bandwidthof less than 4% when possible, (ii) producing increasemetrics for malicious relays of no larger than 0.2, and(iii) producing liar metrics of malicious relays less thanthose of honest non-trusted relays. We can assume that∆ > 0.2, or the previously-described increase framing


attack would be possible. Therefore, no malicious re-lays are unevaluated due to an increase metric above ∆.Furthermore, no setting of the liar threshold L is ablediscriminate between malicious and honest relays. Ei-ther it would allow the large adversary weight inflationsshown in Table 3 or it would put all honest non-trustedrelays in the unevaluated set. We can see that by report-ing a false bandwidth of 15-100 times the load-balancedrate reported by all honest relays, malicious relays canobtain weight inflated 7.4-28.1 times.

5 PeerFlowPeerFlow uses two relay-measurement techniques: (i)each relay reports on the number of bytes it sent to orreceived from other relays, and (ii) each relay reportsits available but unused bandwidth. Measurements fromthe first technique are combined to estimate the to-tal bytes transferred after trimming a weight fractionλ of the largest and smallest values so that an adver-sary without λ of the network capacity cannot manip-ulate the outcome. Measurements from trusted relays(if available) are used to ensure that the estimates ofbytes transferred are not unreasonably high or low. Thesecond measurement technique allows the network todiscover any unused bandwidth. It is vulnerable to ly-ing by an adversary, though, and therefore PeerFlow willonly consider increasing the consensus weight for a relayafter consulting the secure outputs of the first measure-ment technique and verifying that the relay carried theexpected amount of traffic. Further methods are used toincrease the privacy, accuracy, and speed of these mea-surements and to securely bootstrap new relays into thesystem. We now explain each of the PeerFlow compo-nents in detail. For convenience, the key variables andparameters are listed in Table 4.

5.1 Measuring total traffic of a relayFor each circuit position, a subset of relays keeps trackof the amount of traffic it sends to and receives fromall of the other public relays while in that position.LetMg,Mm, andMe be the set of measuring guards,measuring middles, and measuring exits, respectively.

G M Emd

g

mc

e

Fig. 1. Measuring positions in a circuit: relay G in position gmeasures relay M , relay M in position mc measures relay E,relay M in position md measures relay G, and relay E in positione measures relay M . A dashed arrow indicates hosts betweenwhich traffic is not measured.

Name DescriptionβSR

p bytes measured by S to and from R in position pρ

(S)R

(p) total traffic relayed by R (inferred by S, in position p)η(S)R estimated traffic relayed by R (with S)κR capacity of relay R computed by Directory AuthoritiesσR self-estimated capacity of relay RωR consensus weight of router RvR

p voting weight of relay R in position ptR measurement time of relay Rλ fraction of measuring relay inferences to trimτ trusted relay weight fraction in each positionεdec max fraction a relay’s weight can decreaseµ weight of measuring relays: 0.75εinc max fraction a relay’s weight can increase: 0.25δnoise traffic amount covered by differential privacy: 1 MiBεnoise differential privacy guarantee: 0.1εerr targeted fraction of traffic added as noise: 0.1qerr probability noise greater than targeted fraction: 0.1εloss accuracy loss from maximum of noisy values: 0.01

Table 4. Key variables (top), parameters (bottom) in PeerFlow

Mp is defined to be the set containing each relay R

with positive voting weight vRp in position p, that is,

Mp = {R|vRp > 0}. Voting weights approximate the

relays’ relative capacity in each position, with only thelargest µ fraction of relays by capacity assigned non-zerovoting weights to speed up measurement (see §5.6).

Each measuring relay r ∈Mp counts the number ofbytes exchanged with each other relay while r is in po-sition p. Bytes are counted at the application layer, andboth sent and received bytes are included in the count.A measuring relay detects itself in the guard positionif the circuit-creation messages are not sent by a publicTor relay, it detects itself in the exit position if the cir-cuit is not extended past the measuring relay, and oth-erwise it assumes the middle position. A measuring mid-dle further divides its observations into client-side anddestination-side measurements according to whether themeasuring relay extended the circuit to the measured re-lay during circuit creation or vice versa, respectively. Ameasuring relay R0 counts traffic sent to and receivedfrom each relay R1 for a measurement period of lengthtR1 that depends on the bandwidth of R1 (see §5.4).As in TorFlow, the measurement periods generally spanmultiple consensuses and need not be aligned with them.Let βR0R1

p be the number of bytes that measuring relayR0 sent to or received from R1 during the measurementperiod while R0 was in directional position p. The pos-sible directional positions are guard, client-side middle,destination-side middle, and exit, denoted g, mc, md,and e, respectively (see Figure 1).


At the end of the measurement period for relay R1,measuring relay R0 will process the traffic count foreach position that it measures for and send it to the Di-rectory Authorities. The first step in processing countβR0R1

p protects the privacy of individual traffic flowsby adding noise. The noise is a random value selectedfrom the Laplace distribution Lap(b), which has mean 0and variance 2b2, where b is set to provide differentialprivacy for a certain amount of traffic on the link (see§5.3). Let NR0R1

p ∼ Lap(b) be the noise value, and letβ̃R0R1

p = βR0R1p +NR0R1

p .The second processing step is to infer the total

amount of traffic to or from R1 seen in directional po-sition p. This is accomplished by adjusting the amountR0 sees by the probability of making that observation.Let qR0

g be the probability of selecting R0 as a guard,as determined from the consensuses of the measurementperiod. Let qR0m , qR0mc, and q

R0md all be set to the probabil-

ity of selecting R0 as a middle. Let qR0e be the probabil-

ity that R0 is chosen as an exit (using the presence ofthe Exit flag as an approximate way to identify possibleexits). Then let the estimate for the total traffic relayedby R1 and seen in directional position p ∈ {g,mc,md, e}be ρR0R1

p = β̃R0R1p /qR0

p .At the end of the measurement period for R1, the

Directory Authorities will receive the ρ statistics aboutR1 from the measuring relays. The Directory Author-ities remove the largest and smallest ρ values for eachdirectional position and aggregate the remaining val-ues to obtain an estimate ρ̄R1

p . When removing the ex-treme values, a voting weight vR

p that is proportional toR’s capacity is attached to each ρRR1

p , and then a frac-tion λ of the voting weight is trimmed from both thetop and bottom of the sorted ρ statistics. These votingweights help ensure that a low-capacity adversary haslittle effect on the measurement outcomes. To be moreprecise, let ij be the index of the relay Rij with the jthlargest value ρ

RijR1

p , let j1 be the largest value such that∑j<j1

vRijp < λ, and let j2 be the smallest value such

that∑

j>j2v

Rijp < λ. The trimmed ρ statistics are ag-

gregated by adding their noisy byte values and dividingby the total selection probability of the untrimmed re-lays: ρ̄R1

p =∑

j1≤j≤j2ρ

RijR1

p qRijp /

∑j1≤j≤j2

qRijp . Ob-

serve that j1 and j2 are defined such that, for all λ ≤ 0.5,some ρ value is untrimmed (i.e. j1 ≤ j2).

When there are no trusted relays, we set λ = 0.256to maximize the size of the adversary that is preventedfrom arbitrarily increasing his weight (see §6). Note thatusing the median (i.e. λ = 0.5) does not provide optimalsecurity in this case. When some positive fraction of

the network τ is trusted in each position, we set λ tomaximize the size of the adversary such that using µ ofthe network for measurement results in a smaller boundon an adversary’s capacity increase compared to simplyusing measurements from the smaller trusted fractionτ . For τ = 0.05, 0.1, 0.2, and 0.3, the respective valuesof λ are 0.34, 0.348, 0.497, and 0.498.

Given the aggregate values ρ̄R1p , the Directory Au-

thorities calculate two estimates for the total number ofbytes relayed by R1. The first is the sum of the client-side estimates, ρR1

c = ρ̄R1g + ρ̄R1mc, and the second is the

sum of the destination-side estimates, ρR1d = ρ̄R1

md + ρ̄R1e .

If R1 can act as a guard, then client-side observationsfor it will be missing for any circuits on which it is aguard, and similarly for destination-side observationsif R1 can act as an exit. On the other hand, whena relay acts as a middle it is observed both on theclient and destination side but should get credit for thattraffic only once. Therefore, the Directory Authoritiesuse ρR1max = max(ρR1

c , ρR1d ) as an estimate for the to-

tal amount of traffic relayed by R1. To avoid excludingclient-side or destination-side observations made on sep-arate circuits, PeerFlow requires that during a measure-ment period a Tor relay will not operate both as a guardand as an exit (Tor effectively already enforces this cur-rently via its bandwidth weights, as exit bandwidth isrelatively scarce and is thus reserved for exiting).

PeerFlow takes advantage of measurements fromany trusted relays by using them to limit the range inwhich a relay’s traffic will be inferred. PeerFlow usestrusted relays in this way instead of simply using onlythe trusted measurements because doing so providesbetter security and accuracy when relatively little ofthe network is trusted. Because many of the highest-bandwidth Tor relays are managed by organizationsclosely aligned with the Tor Project, and (as of 15 May2015) the top 60 relays constitute over 20% of the totalweight, it seems reasonable to imagine a set of trustedrelays that carry 15-25% of Tor traffic.

Let Tp be the set of trusted relays in position p, andlet τ be the minimum fraction of relay capacity that theyare assumed to provide in each of the guard, middle, andexit positions. At the end of a measurement period forR1, the Directory Authorities simply combine the mea-surements from the trusted relays of bytes exchangedwith R1 to determine the following trusted estimate forthe total bytes relayed by R1 and observed in position p:ρ̂R1

p =∑

R∈TpρRR1

p qRp /∑

R∈TpqR

p . PeerFlow then com-bines these positional trusted estimates as it does withthe analogous estimates from all relay measurements


to produce ρ̂R1c = ρ̂R1

g + ρ̂R1mc, ρ̂R1d = ρ̂R1

md + ρ̂R1e , and

ρ̂R1max = max(ρ̂R1c , ρ̂R1

d ).The trusted estimate ρ̂R1max is used to adjust the

all-relay estimate ρR1max by enforcing a ceiling and flooron its value. The ceiling is simply ρ̂R1

ceil = ρ̂R1max, whichmeans that the inferred traffic relayed by R1 will belimited by the number of bytes it can exchange withtrusted relays. The floor is ρ̂R1

floor = ρ̂R1maxτ/(µ(1 − λ)),which both limits the amount that the adversary canuse falsely low measurements from its relays to reducehonest relays’ inferred traffic relayed and ensures thatthe adversary gains no advantage in targeting its band-width on the trusted relays instead of the top 1 − λ

fraction of measuring relays in a given position. The fi-nal estimate for the traffic relayed by R1 is thus ρR1 =max(min(ρ̂R1

ceil, ρR1max), ρ̂R1

floor). If there are no trusted re-lays, then ρR1 = ρR1max.

5.2 Measuring available bandwidthEach relay R monitors its network activity during ameasurement period and estimates its total capacity forrelaying traffic. That is, R estimates the maximum rateat which it could have relayed traffic during the mea-surement period if Tor clients had asked it to. This couldbe measured in the same way that Tor relays currentlydetermine their self-advertised bandwidths. R sends thisself-measured value σR to the Directory Authorities atthe end of the measurement period.

5.3 Preserving link privacy with noiseThe traffic statistics that measuring relays report to Di-rectory Authorities reveal how traffic flows through theTor network. Traffic statistics per relay are already col-lected and reported by Tor [1], but the PeerFlow statis-tics describe traffic between each pair of relays. Thesestatistics should not assumed to be kept secret by theDirectory Authorities, because some Directory Authori-ties may be compromised and because it allows auditingof PeerFlow to identify errors and relay misbehavior.

The risk of releasing the PeerFlow measurements isthat they could be used to identify the routes throughthe Tor network taken by a target set of connections. Forexample, a malicious destination might target an incom-ing connection and use its observation of the exit nodeand traffic volume to identify as the middle node that re-lay measured by PeerFlow to have sent the same amountof traffic as the target connection. The guard relay couldthen be identified similarly. Although there exist otherways of identifying the relays used on a connection (e.g.the congestion attack [14], latency attack [18], and pre-decessor attack [34]), this information should still havesome protection.

Therefore, measuring relays in PeerFlow add a ran-dom value to the observed byte totals. The goal of thisadded noise is to limit the certainty with which an ad-versary can conclude that a given client stream was car-ried between a given pair of relays. We accomplish thisby choosing the noise value randomly from the Laplacedistribution with mean 0, which has the probability den-sity function Lap(b;x) = e−|x|/b/(2b). Adding noise ac-cording to the Laplace distribution provides differen-tial privacy [13], where the privacy notion applies to agiven amount of traffic. We set the Laplace parameterto b = δnoise/εnoise, where δnoise is the maximum amountof traffic for which εnoise-differential privacy is provided(we use δnoise = 1 MiB and εnoise = 0.1). Providingdifferential privacy to Tor traffic statistics follows thesuggestion of Goulet et al. [16]. Appendix A describesa cryptographic measurement-aggregation scheme thatfurther limits the amount of traffic data revealed.

5.4 Measurement periodsThe length of a measurement period is determined foreach relay individually and is updated at the end of theeach measurement period. We would like this length tobe low in order to enable quick response to changes inrelay bandwidth and load. The speed of measurement islimited by how quickly a relay can exchange an amountof client traffic with each measuring relay such that theadded noise is relatively small.

Let tR1 be the length of the next measurement pe-riod for relay R1. The Directory Authorities determinetR1 after inferring ρR1 by using ρR1 and the consensusweights to estimate the traffic rates with each measur-ing relay R0 in each position p. tR1 is set large enoughsuch that with probability 1 − qerr for at least 1 − 2λof the measuring relays by voting weight the amountof noise added is less than a fraction εerr of the trafficexchanged with R1 during the measurement period (weuse qerr = 0.1 and εerr = 0.1). 1−2λ of the voting weightensures accurate estimates remain after trimming thelargest and smallest fraction λ of the ρ statistics. In ad-dition, tR1 must be set large enough to limit to εloss theloss in accuracy due taking the maximum of noisy valuesρ̂R1

c and ρ̂R1d (see App. B for details; we use εloss = 0.01).

5.5 Load balancing using measurementsThe Directory Authorities use the aggregate peer-measurement ρR and self-measurement σR to producethe consensus weight ωR for relay R. Tor clients selectrelay R for a given position in a new circuit with proba-bility proportional to ωR (approximately, see §2). Thus,in order to balance the load across the available relays,the Directory Authorities attempt to determine consen-


sus weights that are proportional to the bandwidth ofeach relay. They must do this even as relay capacitiesand network traffic change, and they must do it in a waythat is secure against manipulation. They will accom-plish this by computing and comparing the amounts oftraffic a relay was expected to transfer, did transfer, andclaims it can transfer.

Let ηR be the amount of traffic that R is expectedto have relayed in the measurement period that has justcompleted. To determine ηR, let PR be the set of relays(including R) that could be chosen for the same posi-tions as R in their most-recently completed measure-ment period based on the Guard, Exit, and Fast consen-sus flags, where for this purpose a relay is consideredto possess each of these flags if they exist for the ma-jority of consensuses during the measurement period.Then let ωR′

0 be the weight used by R′ ∈ PR dur-ing its most-recently completed measurement period,let tR′0 be the length of that period, and let ρR′

0 be theinferred number of bytes relayed during that period. Fi-nally, set ηR to be the voting-weight median of the set{ωR

0 tR0 ρ

R′

0 /(ωR′

0 tR′

0 )}R′∈PR , which is the set of inferredbytes transferred adjusted for differences in consensusweight and measurement-period length.

Let κR be the current estimate for the capacity of R,that is, the amount of traffic that R is capable of relay-ing. κR is set initially during the bootstrapping process.Let SR be the measurement state of R. After the boot-strapping period has finished, SR will only take valuesnormal and probation (its use during bootstrappingis described in §5.7).

For SR = normal, if R is non-trusted, Figure 2 de-scribes how the Directory Authorities update the con-sensus weight ωR, the relay state SR, and the capac-ity estimate κR. Observe that κR is never set to belarger than the relay’s self-estimate σR and that κR isincreased to the observed rate ρR/tR if κR is less thanit. κR will only be decreased if R transfers a certainfraction εdec less than both the expected amount ηR

and R’s estimated limit κRtR. This fraction is set to atmost εdec ≤ 1 − τ/(µ(1 − λ)) to protect honest relaysfrom being forced into probation by an adversary. Set-ting εdec this way protects an honest R because R willsend the expected amount to the trusted relays, and so(1− εdec)ηR = ρ̂R

floor ≤ ρR. Without trusted relays, εdecshould be set to a small value to allow some naturalvariation in traffic amounts without allowing too muchunderperformance without penalty (e.g. εdec = 0.25). IfρR falls below the required threshold but R indicateswith σR that it believes it has a higher capacity thanthe amount measured, then R maintains its capacity

1: κ′R = min(max(ρR/tR, κR), σR)2: ωR = min(max((1 + εinc)κR, κ′R), σR)3: κR = κ′R

4: if ρR < (1− εdec) min(κRtR, ηR) ∧ ρR/tR < σR

then5: SR =probation6: ωR = κR

7: end if

Fig. 2. Non-trusted relay weight algorithm if SR = normal

1: κR = min(ρR/tR, σR)2: SR =normal3: ωR = min((1 + εinc)κR, σR)

Fig. 3. Non-trusted relay weight algorithm if SR = probation

but enters the probation state. When the relay staysin the normal state, the final weight produced ωR isonly increased either to a newly-demonstrated capacityκR or, if R believes it has additional unused capacity, toa fraction εinc over the old capacity (we use εinc = 0.25).

Probation allows a relay to avoid a weight decreasedue to a random or malicious lack of client traffic. Whennon-trusted relay R enters the probation state, it at-tempts to prove over the next measurement period thatit is capable of transferring at a rate that is the min-imum of its current capacity κR and its self-assessedcapacity σR. To do so, R monitors the number of bytesthat it transfers to other relays, and it exchanges (iden-tified) dummy traffic as needed to convince all mea-suring relays in Mp for some p that it has relayedmin(κR, σR)tR bytes total. When R′ ∈ Mp receives βdummy traffic from R, it echoes it back and adds µβto βR′R

p . The µ factor is needed because dummy trafficis only sent to measuring relays, and, because dummytraffic is otherwise treated the same as other traffic, itprovides no additional opportunity for the adversary tocheat the system. As shown in Figure 3, at the end ofthe measurement period the relay leaves probation sta-tus and is assigned the capacity that is measured ρR/tR,unless that exceeds its updated self-assessment.

Trusted relays also update their weights as shownin Figures 2 and 3, with the additional requirement thattrusted relays maintain τ fraction of the weight (i.e. se-lection probability) in each position. After the update ofωR for any relay R, an inflation factor is computed thatis multiplied by the trusted-relay weights when comput-ing selection probabilities qR

p for R ∈ Tp. The inflation


factor is taken to be minimum value greater than onesuch that

∑R∈Tp

qRp ≥ τ for p ∈ {g,m, e}.

5.6 Updating voting weightsFor accuracy and security, the voting weight of a relayshould reflect the amount of bandwidth it has providedin a given position. Giving relays with high contributedbandwidth high voting weights improves accuracy be-cause those relays have the most observations aboutother relays’ activity. It improves security because itrequires the use of costly bandwidth both for an adver-sary to obtain large weights for its relays and to affectthe weight of other relays. Thus PeerFlow bases votingweights on previous estimates for the amount of relayedtraffic. However, voting weights are updated more slowlythan consensus weights in order to increase the upfrontbandwidth cost of obtaining influence over consensusweights. In addition, a fraction of the smallest relays isexcluded from voting to speed up measurement.

We need to maintain that the measuring relays con-stitute a significant fraction of the network. However, wewould also like to update voting weights infrequentlyto force an adversary to relay traffic for a while beforeincreasing his voting weight. To satisfy the former con-cern, we select a fraction µ of the network capacity ineach position to receive a positive voting weight (we useµ = 0.75). To satisfy the latter, we only update the vot-ing weights when the cumulative selection probabilityof any 1− λ of the voting weight in some position fallsbelow (1−λ)µ (i.e. when

∑R∈S q

Rp < (1−λ)µ for some

relay set S with voting weight at least 1− λ in positionp). This will ensure that a small adversary (i.e. one withvoting weight α < λ) cannot inflate his weight by tar-geting the 1 − λ fraction of measuring relays that havehad their selection probability reduced the most.

The Directory Authorities initiate any update ofthe voting weights after determining the next consensusweights. During the update, each Directory Authoritygives a positive voting weight to the relays that havethe largest capacity and constitute a fraction µ of thenetwork capacity. Let κR

p be the current estimated ca-pacity κR of relay R multiplied by the position weightfor R and p ∈ {g,m, e}. Let ij,p be the index of the re-lay with the jth largest value κR

p . Let j∗p be the numberof relays needed to reach a fraction µ of

∑R κ

Rp . Relay

Rij,p , j ≤ j∗p , is assigned a voting weight for position p

of vRij,pp = κ

Rij,pp /

∑k≤j∗p

κRik,pp . Relay Rij,p , j > j∗p , is

assigned a voting weight for position p of vRij,pp = 0.

5.7 Bootstrapping new relaysPeerFlow boostraps new relays into the system usingthe following staged process:Initialization: A new relay that Tor would currentlyjust add to the next consensus has its measurement stateSR initialized to unknown.Unknown: In any consensus period a set of BandwidthAuthorities, such as those currently used by TorFlow,estimate the capacity of a relay R with SR =unknownby downloading a set of test files of increasing sizethrough a one-hop circuit consisting of R. The test fileis obtained from a Bandwidth Authority itself, and theBandwidth Authority should have enough capacity tomeasure a reasonable lower-bound on capacity for thelargest relays. Tor’s current restrictions on one-hop cir-cuits can be avoided by having the Bandwidth Author-ity act both as the client and as a spurious second hop.To reduce the ability of the adversary to slow down thetesting process, Bandwidth Authorities test relays in theorder that they joined, and they only allow the down-load to take as long as it would take a relay with theminimum amount of bandwidth needed for the Fast flag(currently 100KB/s [3]).

Regardless of whether all downloads finished, at theend of the tests for relay R, the Bandwidth Authorityestimates a sustainable rate for the relay κR as the mini-mum of the observed rate (i.e. the test bytes transferredover the time needed to transfer them) and the initialself-measured capacity σR. The weight of R is initial-ized to ωR = κR. The measurement time tR is set asdescribed in §5.4, except the traffic amounts ρR

p in eachmeasurement position p (which do not exist initially) areestimated by using those amounts ρR′

p from the other re-lays R′ with the same flags as R after their most-recentlycompleted measurement period. To estimate ρR

p fromthese, each ρR′

p is normalized to ωRρR′

p /(tR′ωR′), andρR

p is set to the vote-weighted median of these values.Finally, the state of R is updated to SR = estimated.

Note that this stage is for performance reasons only.The amount of data downloaded during the tests is nothigh and is not intended as a security barrier. In ad-dition, the measured relay is aware that the requesteddownloads are measurement tests performed by Band-width Authorities.Estimated: Relay R with SR =estimated starts beingselected only for the middle position when doing so willnot cause the selection probability of estimated relaysto exceed some small amount (e.g. 5%). The middle po-sition cannot observe the client or destination directly,and so PeerFlow allows relays to occupy the middle po-sition based only on an insecure capacity estimate. The


limit on the probability of estimated relays limits theextent of observations they can make. Note that esti-mated relays are not considered when determining themeasuring relays, which, among other things, preventsan adversary from triggering a voting-weight update byflooding new relays.

Measuring relays measure R just as they do forrelays in the normal state. After tR time, an esti-mate ρR is produced for the amount relayed by R us-ing the same trimmed vote as for relays in the nor-mal state. Then the capacity estimate is updated toκR = min(ρR/tR, σR), and the consensus weight is setto ωR = κR. Finally, the relay’s state is updated toSR = normal.

6 Security analysisThe main security goal of PeerFlow is to ensure thatan adversary that relays a fraction φ of Tor’s traffic ina given position only obtains a total relative consensusweight of γφ, where γ is a small advantage multiple. Wefirst examine consensus weights in a given voting-weightperiod (voting-weight periods are described in §5.6). Weshow bounds on γ, which, in addition to the limits im-posed by trusted relays, show that if the adversary issmall (i.e., has a maximum voting-weight fraction ofα < λ), the advantage is bounded even without anytrusted relays. We then examine the consensus weightsacross voting-weight periods and show that PeerFlowprovides bounded advantage there as well.

6.1 Weights in a single voting-weightperiod

Let αp, p ∈ {g,m, e}, be the voting weight fractionthat the adversary’s relays cumulatively possess in po-sition p, and let α = maxp αp. Let A be an adversar-ial relay. Let βA be the total number of bytes sentor received by A during its measurement period. Letnmsr = max(|Mg|+ |Mm|, |Me|+ |Mm|), and let Sj =∑

1≤i≤nmsrN j

i , j ∈ {1, 2}, where each N ji is an in-

dependent random variable distributed in same wayas the PeerFlow noise values (i.e. with distributionLap(δnoise/εnoise)). We use the variables in Table 4 to re-fer to those values in the current measurement period,and we denote by x0 the value of variable x in the lastmeasurement period (e.g. κR

0 indicates the capacity in-ferred for R for its previous measurement period). Notethat the proofs of Lemmas 1 and 2 and of Theorem 1appear in Appendix B.

We focus on α < λ because α = 0 at the beginningof an attack, small adversaries will maintain small α(as shown in §6.2), and with α > λ the adversary’s

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

0.21

0.22

0.23

0.24

0.25

Adv relative capacity

0.0

0.2

0.4

0.6

0.8

1.0

Max

rela

tive

infe

rred

capa

city

Trusted fraction: 0.00Trusted fraction: 0.05Trusted-only frac: 0.05Trusted fraction: 0.10Trusted-only frac: 0.10Trusted fraction: 0.20Trusted-only frac: 0.20Trusted fraction: 0.30Trusted-only frac: 0.30

Fig. 4. Max adversarial relative inferred capacity across votingperiods using trusted relays and using trusted relays only

inferred capacity is bounded only by the trusted ceiling(i.e. the limit imposed by ρ̂A

ceil). Lemma 1 shows thatthe expected bytes that PeerFlow infers were transferredby A (i.e. ρA) are upper-bounded in some way by thenumber of bytes that A actually exchanged (i.e. βA). Aswe will show, assuming A exchanges close to the amountnecessary to stay out of probation, then the noise sumsS1 and S2 will be small relative to βA, and the upperbound in the lemma is roughly βA/((1 − λ − α)µ). Forλ = 0.256, α < λ, and µ = 0.75, the adversary is thuslimited to increasing ρA by a factor of roughly 2.73.

Lemma 1. If α < λ, then over the random noise values

E[ρA] ≤ ((1− λ− α)µ)−1 E[max(S1, β

A + S2)].

Lemma 1 limits the amount an adversary can increasehis relays’ inferred bytes transferred. His total consen-sus weight is relative to the honest relays, however, andit may occur that many of these relays transfer most oftheir traffic with a small number of measuring relays,whose values get trimmed. However, it is unlikely thattraffic from honest clients is skewed much in this way.Let γ′ ≤ 1 be the smallest factor by which some setof honest measuring relays with voting weight 1 − 2λunderestimates an honest relay. Let A be the set of ad-versarial relays. Lemma 2 states that the inferred bytesof each honest relay is at most γ′ of the true value βR/2.

Lemma 2. If α < λ, then, for all R /∈ A, ρR ≥ γ′βR/2.

These lemmas combine to prove Theorem 1, whichstates that either the malicious relay sends less thanexpected, putting it into probation, or it obtains a rel-ative consensus weight at most a constant factor abovethe relative amount of traffic it transfers. With α = 0,γ′ = 1, µ = 0.75, εinc = 0.25, λ = 0.256, and εloss = 0.01,for example, the advantage factor is γ = 4.52.

Theorem 1. For α < λ and over the random noisevalues, if βA < (1−λ−α)(1− εdec)(1− εloss)µρA

0 tA/tA0 ,


then E[ρA] puts A into probation, and otherwise

E[ωA]∑R ω

R≤[

2γ′(1 + εinc)(1− εloss)(1− λ− α)µ

](βA/tA)∑R(βR/tR)

.

While in theory the adversary can obtain the advantagefactor γ in Theorem 1 (the value in brackets), doingso requires certain attacks that have costs and limita-tions. To obtain the factor of 2 in γ, the adversary mustonly send or receive “one-sided” traffic to measuring re-lays, i.e. only from the client side or destination side.In particular, he must not actually relay any Tor traffic.Acting this way is highly observable and is likely to benoticed by clients. To obtain the ((1−λ−α)µ)−1 factorin γ, the adversary must exchange traffic only with a setof measuring relays with voting weight 1−λ−α. This isobservable by Directory Authorities (although hard todistinguish from a possible framing of an honest relay)and also noticeable by clients.

6.2 Weights across voting periodsWe now consider the ability of an adversary to increasehis inferred capacities κA (and thus his weights ωA) overtime, A ∈ A. For this analysis, we ignore the possibleinflation effect of the added noise (i.e. εloss = 0.01),assume that honest relays send their share of the trafficvolume to the measuring relays (i.e. that γ′ = 1), andassume that voting weights and selection probabilitiesare maintained to be proportional (e.g. as when there isno network churn).

The adversary can apply a feedback attack to thebounds of Theorem 1 by repeatedly exchanging all traf-fic with a target set of relays with voting weight 1−λ−α,including the trusted relays, as α grows. He can ac-complish this, for example, by severely rate limitinghonest client connections and creating his own dummyconnections through the targeted measuring relays. Ini-tially, when α = 0, this inflates his total capacity by afactor of at most 2/((1 − λ)µ). Then, once the adver-sary receives an inflated voting weight α, he can target1− λ− α of the measuring relays to exchange all trafficwith, further increasing his voting weight by a factorof (1 − λ)/(1 − λ − α). The adversary can repeat thisprocess until either he reaches a fixpoint or he receivesvoting weight α > λ, at which point the capacities of hisrelays are either limited by the trusted-relay ceiling (i.e.ρ̂Aceil) or are unbounded without trusted relays, and the

adversary can start to deflate the weights of the honestrelays, which are ultimately limited by the trusted-relayfloor (i.e. ρ̂R

floor).We simulate this process across voting periods un-

til the adversary’s weight no longer increases. We also

consider a simpler variant of PeerFlow in which onlytrusted-relay measurements are used. In this variant,the trusted-relays measurements are used directly (i.e.ρR = ρ̂max), there are no voting weights, and the otherPeerFlow components operate in the same way.

The results are shown in Figure 4. It shows thatPeerFlow can provide bounded weight inflation to smalladversaries even when there are no trusted relays.Specifically, the inflation factor is γ < 4.6 when τ =0 and the adversary is at most 4% of the network.This compares favorably to the measured TorFlow andEigenSpeed inflation factors of up to 177 and 28.1, re-spectively. It also shows that using measurements fromall relays protects against small adversaries at the ex-pense of worse security against larger adversaries com-pared to using measurements from only trusted relays.Making this tradeoff is consistent with Tor’s security ingeneral [24]. We can also see that as the trusted fractionof the network grows, the security from using all relaysand from using only trusted relays become the same. Inboth cases, as the trusted fraction grows, an adversarywith φ of the network capacity has his inferred relativecapacity limited to (2φ/τ)/(1 + 2φ/τ), which implies abounded inflation factor of γ ≤ 2/τ .

7 Load-Balancing AnalysisIn this section, we experimentally demonstrate Peer-Flow’s ability to effectively distribute network traffic torelays in proportion to their bandwidth capacities.

7.1 Experimentation SetupWe evaluate PeerFlow and its effect on the con-sensus path selection weights, network goodput, andclient performance using Shadow [21]. Shadow is anopen-source [2] parallel discrete-event network simula-tor/emulator hybrid that has been extensively validatedand utilized for Tor network experimentation [19, 20].Shadow experiments are completely isolated from thenetwork, so they are safe and contain no privacy riskto the public Tor network or its users. Because Shadowruns real applications as plug-ins, it is ideal for analyz-ing application-layer effects, e.g. those that result frommodifying Tor’s load-balancing protocols.Our Private Tor Network: We generated a new Tornetwork configuration for Shadow using the tools inthe Shadow distribution and Tor Metrics data [4] col-lected during 2014-09. The resulting Tor network con-tains 4 Tor directory authorities, 498 Tor relays, 7,500Tor clients, and 1,000 servers. Our private network rep-resents the public Tor network from 2014-09 at scale ofapproximately 1

12 , and we ran our private network for


5 virtual hours for each experiment (using the first twofor network bootstrapping). We repeated each experi-ment 3 times to measure variability across runs, whichwe quantify for a given performance metric by compar-ing the distribution F (x) across all 3 experiments to thedistribution Fi(x) for the ith experiment, i ∈ {1, 2, 3},using the KS statistic: maxx |F (x)− Fi(x)|.

We experiment with and compare three models todetermine the effect of load-balancing weights on clientperformance and the distribution of network load. In theIdeal model, each relay’s consensus weight is initializedto the capacity of its network interface as configured inShadow, and these weights do not change throughoutthe experiment. In the TorFlow model, we run our newShadow TorFlow plug-in that we developed2 to mimicthe behavior of TorFlow as it operates in the public Tornetwork. In the PeerFlow model, we run a PeerFlow pro-totype3 that we implemented in Tor (forked at version0.2.5.10). We did not implement probation in our pro-totype. However, we do measure when probation wouldbe triggered and how much traffic would be needed toavoid probation, and those measurements are presentedin §8.2. In both the TorFlow and PeerFlow models, thebandwidth information used to produce the consensusweights is dynamically updated throughout the exper-iment. We run three experiments for each model usingrandom seeds, and for each model we present the resultsaggregated across all three experiments. Note that wedo not compare EigenSpeed because (i) it is not cur-rently used in practice; and (ii) we believe that Tor willnot adopt it at least partially due to the vulnerabilitiesdiscussed in §4.

7.2 Network PerformanceA primary goal of Tor’s network load-balancing algo-rithm is to distribute traffic to best utilize existing re-sources, and we use goodput as a metric for determin-ing effective use of such resources. For our purposes,relay goodput, or application throughput, is the num-ber of application bytes that a relay is able to for-ward over time. More formally, if a relay Ri receivesBr

i (T ) application bytes from the network and sendsBs

i (T ) application bytes to the network at time T overtime interval t, then we define its goodput as Gi(T ) =min(Br

i (T ), Bsi (T ))/t. We normalize t to 1 second for

ease of exposition.

2 Our Shadow TorFlow plug-in contains 2128 lines of code3 Our PeerFlow prototype contains 1222 lines of code

To determine how load is distributed over the re-lays, we compute each relay’s per-second goodput Gi(T )for every second of the final 3 virtual hours of simula-tion. Figure 5a shows the cumulative distribution of thegoodput for each relay-second, over all relays, seconds,and experiments. The goodput achieved under all threemodels were 199.0 KiB/s for Ideal, 0.1 KiB/s for Tor-Flow, and 30.1 KiB/s for PeerFlow in the median, and960.3 KiB/s for Ideal, 729.5 KiB/s for TorFlow, and683.0 KiB/s for PeerFlow in the third quartile. Eachper-experiment distribution was similar to the combineddistribution with a maximum KS statistic of 0.0604.

Given our definition of relay goodput, we define ag-gregate load L at time T as the sum of all relay good-puts, i.e., L(T ) =

∑i Gi(T ). We compute L for all t ∈ T

(every second), and plot the cumulative distributionover all seconds in all experiments for each model. Theresults are shown in Figure 5b. The median and stan-dard deviation of aggregate goodput are 483.7 and 10.2MiB/s for the Ideal model, 437.9 and 11.3 MiB/s forthe TorFlow model, and 427.4 and 11.8 MiB/s for thePeerFlow model. Some per-experiment distributions dif-fered significantly from the combined distribution witha maximum KS statistic of 0.393. Given these results,we conclude that PeerFlow does not dramatically re-duce relay goodput compared to the TorFlow model,and that optimizing PeerFlow may be able to increaseaggregate relay goodput to be closer to Ideal.

We also consider network utilization as a load-balancing metric, which demonstrates how well the al-gorithms in our load-balancing models use the availablenetwork resources. If relay Ri has goodput Gi(T ) andcapacity Ci, we define its utilization Ui(T ) at time Tas Ui(T ) = Gi(T )/Ci. We compute utilization for eachrelay each second, and show the distribution of utiliza-tion rates over all experiments in Figure 5c. Our resultsshow that 75% of relays had a utilization of 43% or lessin TorFlow, 49% or less in PeerFlow, and 61% or less inIdeal. Each per-experiment distribution was similar tothe combined distribution with a maximum KS statisticof 0.0628. We thus conclude that while PeerFlow im-proves utilization over TorFlow, further improvementsmay still be possible.

7.3 Client PerformanceAnother goal of load-balancing algorithms is to mini-mize client performance bottlenecks. We use file down-load times as a metric for determining how clients willperceive performance changes resulting from using Peer-Flow. We consider the time to download the first byteof all files as a general measure of latency (Figure 5d),


0 2 4 6 8 10Goodput (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(a) Relay goodput per second

380 400 420 440 460 480 500 520 540Goodput (MiB/s)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(b) Aggregate relay goodput per second

0 20 40 60 80 100Bandwidth Utilization (percent)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(c) Relay utilization per second

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Download Time (s)

0.5

0.6

0.7

0.8

0.9

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(d) Time to first byte of download

0 5 10 15 20 25Download Time (s)

0.5

0.6

0.7

0.8

0.9

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(e) Time to last byte of 320KiB download

0 10 20 30 40 50Download Time (s)

0.5

0.6

0.7

0.8

0.9

1.0

Cum

ulat

ive

Frac

tion

Ideal

TorFlow

PeerFlow

(f) Time to last byte of 5MiB download

Fig. 5. Network utilization and client performance for the Ideal, TorFlow, and PeerFlow load-balancing models over all experiments.

and the time to download the last byte of 320KiB files(Figure 5e) and 5MiB files (Figure 5f) as measures ofperformance for web or bulk type downloads, where thedistributions considered are across all 3 experiments.As the distribution of download times is nearly identi-cal for all measurements below the median, we focus onthe upper half of the distributions.

With regards to the 320KiB file downloads in Fig-ure 5e, the 3rd quartile download time is highest forIdeal at 2.00 seconds, followed by TorFlow at 1.51 sec-onds, and lowest for PeerFlow at 1.44 seconds. The max-imum download time is just over 60 seconds for all mod-els due to the default 60 second timeout set by the trafficgeneration tool used in Shadow. These trends are simi-lar for both 320KiB and 5MiB downloads, as well as forthe time to first byte metric. Each per-experiment dis-tribution was similar to the combined distribution witha maximum KS statistic of 0.0461.

Given our results, we conclude that the securitybenefits of PeerFlow may come with a slight reductionin download times across all file sizes when compared toboth TorFlow and Ideal, and a small increase in down-load time for the smaller first byte and 320KiB met-rics for less than 10% of the downloads. We also notethat while ideal measurement does result in the highestoverall network utilization, as expected, it doesn’t yieldfaster individual downloads, and this unintuitive phe-nomenon may merit further investigation. Furthermore,probation is not fully implemented in our prototype,and so it is possible that PeerFlow will underestimaterelays’ weights. We now analyze the extent of such er-rors in weight calculations in our experiments.

Inaccuracy ImprecisionAvg. Std. Dev. Avg. Std. Dev.

TorFlow 1.02 0.244 0.288 0.213PeerFlow 0.428 0.0381 0.217 0.0814

Table 5. Weight errors in TorFlow and PeerFlow.

7.4 Consensus Weight Errors

TorFlow and PeerFlow produce relay weights that getadded to the Tor network consensus. These weights biasthe relays chosen by clients when constructing circuits,thus affecting the distribution of client load. Ignoringadditional position weights that are applied based onthe ability to serve as a guard or exit relay, the ratio of agiven relay weight to the sum of all of the relays’ weightsroughly approximates the fraction of total network loadthat will be directed to that relay.

A good load-balancing scheme should distributeload proportional to the real capacities of the relays;in other words, a relay’s relative weight should matchits relative capacity. We can thus consider the inaccu-racy of load balancing in Tor by comparing the relays’relative consensus weights to their true relative band-widths. Let Ci be the bandwidth capacity of relay Ri

and its relative capacity be Ci = Ci/∑

j Cj , and letW j

i be the weight of Ri in the jth consensus and itsrelative weight be Wj

i = W ji /∑

k Wjk . We measure the

inaccuracy of the weight of Ri as | log10(Ci/Wji )|. We

also measure the imprecision of the weights of Ri acrossconsensuses as their sample standard deviation dividedby their sample mean.

Table 5 compares the inaccuracy and imprecision ofTorFlow and PeerFlow. For each experiment, the inac-


curacy is averaged across all relays and consensuses, andthe average and standard deviation across experimentsis shown. Similarly, the imprecision results are averagedover all relays for each experiment, and the average andstandard deviation across experiments is shown. The re-sults show that our PeerFlow prototype was both moreaccurate and more precise than TorFlow.

8 Speed and Efficiency AnalysisPeerFlow schedules measurement rounds individuallyfor each relay depending on the relay’s expected clienttraffic. With more client traffic, a relay’s measurementswill more quickly be large relative to the added noise,and so the measurement time can be smaller. Thesemeasurement times also affect the bandwidth efficiencyof the protocol because they determine how often mea-surements are sent to the Directory Authorities.

8.1 SpeedIn order to estimate relay measurement times when run-ning PeerFlow on the Tor network, we use historical net-work data from CollecTor [1]. These data include pastnetwork consensuses, from which we can determine thetypes of relays that existed, and extra-info descriptors,from which we can determine the amount of traffic eachrelay transferred.

We use the consensus from January 31, 2015, at00:00 and the extra-info descriptors from January 2015as the basis for our analysis. We estimate the averagerate of client traffic transferred by each relay using thebyte histories over the hour before the consensus. Afterexcluding relays with empty or incomplete byte historiesor with selection probabilities of zero, there are 5917relays in the network.

We estimate PeerFlow’s determination ρ̄Rp of the

bytes relayed by R and observed in position p by tak-ing the minimum of read and written bytes in the bytehistory and multiplying it by R’s relative probabilityof being selected in each circuit position. We take themeasuring guards, measuring middles, and measuringexits to be the largest µ = 0.75 fraction of relays byposition-weighted bytes relayed (i.e., for relay R, βR ismultiplied by the position weight for R in the given mea-suring position). The result is 400 measuring guards, 749measuring middles, and 145 measuring exits.

We apply the measurement-time procedure given in§5.4 to estimate what the measurement times of ex-isting Tor relays would be under PeerFlow. Figure 6shows the resulting distribution of measurement timesover the observed bytes transferred. Measurement timesbelow 14 days are shown, which constitute 96.8% of the

0 50 100 150 200 250 300 350

Measurement time (hours)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

.fra

ctio

nof

tota

lcap

acity

Fig. 6. Measurement times weighted by observed relay capacityNumber of Relays Dummy Traffic (KiB/s)

εdec Median Max Avg. Total Std. Dev.0.05 14 75 2.38 0.6770.25 7 55 1.49 0.5730.50 4 57 2.05 1.330.75 1 17 0.039 0.01420.95 1 10 0.00328 0.00385

Table 6. Median and maximum number of relays in probation andthe amount of dummy traffic needed to avoid probation

times by relay capacity. The 25th-percentile measure-ment time is 11.5 hours, the 50th is 27.7 hours, and the75th is 70.7 hours. The relays with measurement timesabove 14 days have low observed capacity, many belowTor’s stated requirements of 100 KB/s for Fast relaysand 250 KB/s for guards. By raising these requirementsby 150%, to 250 KB/s for Fast relays and 625 KB/s forguards, and excluding relays with observed capacitiesbelow these requirements, the maximum measurementtime can be lowered to 316.4 hours (i.e. 13.2 days).

8.2 EfficiencyPeerFlow has bandwidth overhead from sending statis-tics to the Directory Authorities and from dummy traf-fic due to relays in probation. Each measuring relaysends data to the Directory Authorities at the end ofa measured relay’s measurement period. Thus the totalmeasurement traffic to them is O(mn), where m is thenumber of measuring relays and n is the number of re-lays. However, each measuring relay sends only the fourρ statistics for each measured relay, and our evaluationshows that measurement periods are generally at least afew hours long. Again using the consensus from January31, 2015, at 00:00, and assuming that each statistic is4 bytes, we find that the average rate of upload to eachBandwidth Authority is only 119.6 B/s.

We investigate the dummy traffic overhead by run-ning Shadow experiments over a range of probation-trigger values εdec. Using the PeerFlow model and net-work described in §7, we ran 3 experiments for each εdec.For µ = 0.75, εdec ranges from 0.90 for a trusted fractionof τ = 0.05 to 0.2 for τ = 0.3. The results in Table 6show that, out of the 498 total relays, the median num-ber of relays in probation in a given second across allexperiments was at most only 14 and that the maximumwas at most only 75. Moreover, the results show the to-


tal dummy traffic needed to meet expectations and thusavoid probation was at most only 2.38 KiB/s on averageover time and across all experiments, with comparablestandard deviation across experiments. This amount isvery small relative to the median network-wide Peer-Flow goodput of 428.0 MiB/s (see §7).

9 ConclusionTor’s security is vulnerable to an adversary runninglarge relays, and we show that under the Tor’s currentTorFlow bandwidth-measurement system and the pro-posed EigenSpeed system an adversary can make smallrelays appear large, drastically reducing the cost of at-tack. We present PeerFlow, show how it limits the abil-ity of an adversary to fool Tor about the bandwidth ofhis relays, and demonstrate that its performance is com-parable to Tor’s current performance. Possible futureimprovements to PeerFlow include improving scalabil-ity and further improving security to reduce even morethe adversary’s ability to inflate his relays’ weights.

Acknowledgments: We thank the anonymous re-viewers for their suggestions and feedback. This workhas been partially supported by DARPA and ONR. Theviews expressed in this work are strictly those of the au-thors and do not necessarily reflect the official policy orposition of DARPA, ONR, or NRL.

References[1] Collector. https://collector.torproject.org/.[2] Shadow simulator. https://shadow.github.io.[3] Tor directory protocol, version 3. https://gitweb.torproject.

org/torspec.git?a=blob_plain;hb=HEAD;f=dir-spec.txt.[4] Tor metrics. https://metrics.torproject.org/.[5] Bandwidth scanner spec. https://gitweb.torproject.

org/torflow.git/blob_plain/HEAD:/NetworkScanners/BwAuthority/README.spec.txt.

[6] Olivier Baudron, Pierre-Alain Fouque, David Pointcheval,Jacques Stern, and Guillaume Poupard. Practical multi-candidate election system. In Ajay D. Kshemkalyani and NirShavit, editors, PODC, 2001.

[7] Kevin Bauer, Damon McCoy, Dirk Grunwald, TadayoshiKohno, and Douglas Sicker. Low-resource routing attacksagainst Tor. In ACM WPES, 2007.

[8] Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Wein-mann. Trawling for Tor hidden services: Detection, measure-ment, deanonymization. In IEEE S&P, 2013.

[9] Ronald Cramer, Ivan Damgård, and Berry Schoenmakers.Proofs of partial knowledge and simplified design of witnesshiding protocols. In Yvo Desmedt, editor, CRYPTO ’94,1994.

[10] Ivan Damgård and Mads Jurik. A generalisation, a sim-plification and some applications of paillier’s probabilisticpublic-key system. In PKC, 2001.

[11] Ivan Damgård and Mads Jurik. A length-flexible thresholdcryptosystem with applications. In Reihaneh Safavi-Nainiand Jennifer Seberry, editors, ACISP, 2003.

[12] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor:The second-generation onion router. In USENIX Security,2004.

[13] Cynthia Dwork. Differential privacy. In International Collo-quium on Automata, Languages and Programming, 2006.

[14] Nathan Evans, Roger Dingledine, and Christian Grothoff.A practical congestion attack on Tor using long paths. InUSENIX Security, 2009.

[15] Pierre-Alain Fouque, Guillaume Poupard, and Jacques Stern.Sharing decryption in the context of voting or lotteries. InFC, 2000.

[16] David Goulet, Aaron Johnson, George Kadianakis, andKarsten Loesing. Hidden-service statistics reported by re-lays. Technical Report 2015-04-001, The Tor Project, Inc.,April 2015.

[17] Andreas Haeberlen, Petr Kouznetsov, and Peter Druschel.Peerreview: Practical accountability for distributed systems.In SOSP, 2007.

[18] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan-Tin.How much anonymity does network latency leak? TISSEC,13(2), February 2010.

[19] Rob Jansen, Kevin Bauer, Nicholas Hopper, and Roger Din-gledine. Methodically modeling the Tor network. In CSET,2012.

[20] Rob Jansen, John Geddes, Chris Wacek, Micah Sherr, andPaul Syverson. Never been KIST: Tor’s congestion man-agement blossoms with kernel-informed socket transport. InUSENIX Security, 2014.

[21] Rob Jansen and Nicholas Hopper. Shadow: Running Tor ina box for accurate and efficient experimentation. In NDSS,2012.

[22] Rob Jansen, Aaron Johnson, and Paul Syverson. LIRA:Lightweight incentivized routing for anonymity. In NDSS,2013.

[23] Rob Jansen, Andrew Miller, Paul Syverson, and Bryan Ford.From onions to shallots: Rewarding Tor relays with TEARS.In HotPETs, 2014.

[24] Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, andPaul Syverson. Users get routed: Traffic correlation on Torby realistic adversaries. In ACM CCS, 2013.

[25] Ghassan Karame, David Gubler, and Srdjan Capkun. On thesecurity of bottleneck bandwidth estimation techniques. InSecureComm. 2009.

[26] Mike Perry. TorFlow: Tor network analysis. In HotPETs,2009.

[27] Robin Snader. Path Selection for Performance- and Security-Improved Onion Routing. PhD thesis, UIUC, 2009.

[28] Robin Snader and Nikita Borisov. Eigenspeed: Secure peer-to-peer bandwidth evaluation. In IPTPS, 2009.

[29] Robin Snader and Nikita Borisov. Improving security andperformance in the Tor network through tunable path selec-tion. TDSC, 8(5):728–741, September 2011.

[30] R. Suselbeck, G. Schiele, P. Komarnicki, and C. Becker.Efficient bandwidth estimation for peer-to-peer systems. In

https://collector.torproject.org/

https://shadow.github.io

https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=dir-spec.txt

https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=dir-spec.txt

https://metrics.torproject.org/

https://gitweb.torproject.org/torflow.git/blob_plain/HEAD:/NetworkScanners/BwAuthority/README.spec.txt




IEEE P2P, 2011.[31] Fabrice Thill. Hidden Service Tracking Detection and Band-

width Cheating in Tor Anonymity Network. PhD thesis,Univ. Luxembourg, 2014.

[32] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson,and Ian Goldberg. Effective attacks and provable defensesfor website fingerprinting. In USENIX Security, 2014.

[33] Tao Wang and Ian Goldberg. Improved website fingerprint-ing on Tor. In ACM WPES, 2013.

[34] Matthew Wright, Micah Adler, Brian Neil Levine, and ClayShields. The predecessor attack: An analysis of a threat toanonymous communications systems. TISSEC, 4(7):489–522, November 2004.

A Enhanced PeerFlowWhile our simulations and analysis show that PeerFlowwould improve the security and performance of Tor’sload balancing at relatively low cost, we note that somefuture concerns are not addressed by our basic design.First, revealing the approximate amount of traffic sentbetween specific pairs of relays increases the informationavailable to an adversary for traffic analysis. In addition,adding noise to traffic statistics makes them less accu-rate and forces PeerFlow to wait to report on a givenrelay until enough client traffic is likely to have beenrelayed. This delays measurement of low-bandwidth re-lays, and as the network grows the relative weight ofeach measuring relay will decrease, thus increasing thelength of reporting intervals.

Here we describe a modifications to PeerFlow thataddress these concerns, by cryptographically protect-ing the privacy of individual measurements. We presentthese modifications as an enhancement to PeerFlow toleave a core design that is more straightforward for theTor Project to implement and adopt.

A.1 Encrypted Measurement AggregationIn order to protect the privacy of individual measuringrelay observations, we introduce a new role for the Band-width Authorities (§5.7). These authorities will have therole of jointly decrypting bandwidth reports after aggre-gation; if a majority of the Bandwidth Authorities col-lude then individual observations can be exposed, butif a majority are honest, then individual observation re-ports will retain their privacy.

The Bandwidth Authorities cooperate to publish apublic key for a threshold homomorphic tally system,while separately maintaining shares of the decryptionkey. In addition to key generation, a threshold homo-morphic tally system supports the following operations:

– Encryption: Given a public key and t talliesb1, b2, . . . , bt, produce a ciphertext that encodesthese counts, providing semantic security.

– Proving: Given a public key PK , randomnessr, and a ciphertext that encrypts a tally, pro-duce a publicly-verifiable proof that the cipher-text is well-formed, i.e. was produced by computingEPK (0, . . . , 1, . . . , 0; r).

– Aggregation: Given ciphertexts c and c′ encodingtallies b1, . . . , bt and b′1, . . . , b

′t and weights w and

w′, produce a ciphertext c that encodes wc1 +w′c′1, . . . , wct + w′c′t.

– Decryption: Given ciphertext c and decryption keyshare s, produce a decryption share d along with apublicly-verifiable proof of correctness. The sharesshould be publicly combinable to produce the jointdecryption. This verifiability allows PeerFlow tomaintain auditability while increasing privacy.

We describe two possible implementations of these op-erations in §A.2.1 and A.2.2.

Given the encrypted tally system, we modify howmeasuring relays report their observations. For each re-lay R, we partition the range ((1 − δ)ρR

p , (1 + δ)ρRp )

into t equally-sized buckets [b1, b2), [b2, b3), . . . , [bt−1, bt),where δ is a maximum allowed relative capacity change.Each measuring relay in position p then computesthe observed value ρR

p (without noise) as in §5.1, andchooses the bucket b ∈ {1, . . . , t} containing ρR

p . Themeasuring relay R′ then encrypts a “vote” for thisbucket, and submits this encrypted tally cR′ to theBandwidth Authorities along with a proof of well-formedness.

Finally, the Bandwidth Authorities use the homo-morphic properties of submitted ballots to combine thetallies, weighted by measuring relays’ voting weights,obtaining a final tally for each relay observation. TheBandwidth Authorities decrypt this aggregate, provid-ing proofs of correct decryption. The Bandwidth Au-thorities then compute the observation for relay R, ρ̄R

p ,as the mean value of the votes after trimming λ votingweight from the smallest and largest buckets. They sendthis value to the Directory Authorities.


A.2 Example Threshold HomomorphicTally Schemes

A.2.1 Paillier-based scheme

Baudron et al. [6] describe a homomorphic tally schemebased on the Paillier cryptosystem; the scheme can bemodified for PeerFlow in a straightforward way. TheBandwidth Authorities engage in a distributed thresh-old Paillier key generation algorithm [10, 11, 15], result-ing in a Paillier public key N with generator g. Using thenetwork consensus, all measuring relays can agree on avalue M which is greater than the sum S of all votingweights, for example M = 2dlog2 Se. Then to encrypt avote for bucket i, a measuring relay chooses r ∈R Z∗Nand computes EN (bi) = gMi

rN mod N2.4

Note that in this scheme, if the sum of voting weightgiven to bucket i is vi, we have EN (bi)vi = EN (viM

i)and

∏i EN (viM

i) = EN (∑

i viMi), so as long as we

have all vi < M , we can recover weights v1, ...vt by rep-resenting the result of decryption in base M (and thiswill be particularly efficient if M = 2k for some k).

Proving in zero-knowledge that a ciphertext c =EN (M i) correctly encrypts a particular bucket valueM i can be accomplished by proving knowledge of an N -th root of c/gMi using the Guillou-Quisquater scheme,which requires the prover to send two elements ofZN2 and a 2`-bit challenge for soundness level 2−` ;this can be extended using the standard techniques ofCramer et al. [9] to prove that

∨ti=1 c = EN (M i) and

converted into a noninteractive proof using the Fiat-Shamir heuristic. All together, this method of provingthat a ciphertext is correct produces a proof of length2t(2 log2 N + `) bits, and can be verified with t modularexponentiations in ZN2 .

Proofs of correct decryption are provided by eachBandwidth Authority individually, following the proto-col described in [10, 11, 15].

A.2.2 Additive Elgamal-based scheme

A tally scheme can also be implemented using additiveElgamal encryption over any group where the DecisionalDiffie-Hellman assumption is hard. In this case, thebandwidth authorities engage in a distributed key gen-

4 If Mt > N , the scheme can use the Damgård-Jurik extensionto work modulo Nd+1, where d is the minimal integer such thatMt < Nd.

eration protocol to produce t public keys {hi = gxi }ti=1

with generators g0, g1, . . . , gt of prime order p whose re-spective discrete logarithms are unknown. To encrypt avote for bucket b, a measuring relay chooses r ∈R Zp,and computes c0 = gr

0, cb = hrbg0, and cj = hr

j for j 6= b;the ciphertext becomes the list c0, c1, . . . , ct. Note thatin this scheme, raising all elements of a ciphertext to ascalar power v scales the tally for bucket b by v, andelementwise multiplication adds the tallies for all buck-ets. After decrypting the elements of a ciphertext usingstandard threshold Elgamal, the individual tallies arerecovered in O(

√∑i vi) time using a standard short

discrete logarithm algorithm.Proving that a ciphertext c encrypts a vote for a

single bucket b in honest-verifier zero knowledge couldbe accomplished as follows: the verifier chooses a ran-dom vector 〈x1, . . . , xt〉 ∈R Zt

p; then both sides com-pute X =

∑i xi, H =

∏i h

xii , and Cb = g−xb

0∏

i cxii .

Finally, the verifier and prover engage in a standardproof of knowledge of equality of discrete logarithms,logg0 c0 = logH Cb, which requires the prover to senda group element and an element of Zp and the veri-fier to send an `-bit challenge. Applying the standarddisjunction technique to the t possible buckets allowsconstructing a proof that c encrypts a single vote forone bucket b ∈ {1, . . . , t}, and applying the Fiat-Shamirheuristic results in a noninteractive proof of length t

group elements plus t elements of Zp and t short expo-nents.Privacy Analysis: Under the assumption that the ma-jority of Bandwidth Authorities are honest, this schemeensures that no information is leaked about the indi-vidual measurements reported by each measuring relaybeyond what is leaked by the voting-weighted sum re-leased for a given measurement period; this follows fromthe semantic security of the tally encryption schemes.Here we analyze the amount of information that couldbe leaked by this aggregate result.

The residual leakage stems from the fact that fewrelays have identical weights in the current Tor network,so many small subsets of relays could have distinctiveweight sums. As an example, we consider the bandwidthweights of the top 750 relays in the Tor network as ofFebruary 9, 2015 (750 is just over the number of mea-suring relays we used in our analysis §8.1). Among theserelays, there are 138 possible sums that can only arisefrom a single subset of the weights, and 2456 possible


0 100 200 300 400 500 600 700 800

Subset Entropy

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Frac

tion

(Wei

ghtS

ums)

Anonymity of Bin Tallies

ideal

full

rounded-1k

rounded-pow2

(a)

0 2 4 6 8 10

Subset Entropy

0

5

10

15

20

Cou

nt(w

eigh

tsum

s)

full

1k

pow2

(b)Fig. 7. The subset entropy of four voting-weight roundingschemes applied to the top 750 bandwidth weights from a recentTor consensus: full is no rounding, ideal assigns all weights to 1,rounded-1k rounds to the nearest 1000, rounded-pow2 rounds upto the nearest power of 2. 7a Cumulative frequencies of subsetanonymity; 7b counts of low-anonymity subsets (capped at 20 toshow detail).

sums that can arise from fewer than 256 different relaysubsets.5

However, this leakage can be mitigated to a largeextent by one of two schemes for rounding of votingweights. In the Rounded-1K scheme, each measuring re-lay’s weight is divided by 1000 and rounded. Contin-uing with the example above, we find that the 750thhighest-weighted relay on Febuary 9, 2015 had a weightof 8160, which under this scheme would be assigned vot-ing weight 8, while the highest-weighted relay had aweight of 334000, and would be assigned voting weight334. In the Rounded-Pow2 scheme, each measuring re-lay’s weight is rounded to the next power of 2 and scaledby the voting weight of the lowest-weighted measur-ing relay. Thus the lowest-weighted measuring relay inthe previous example would have its weight rounded to8192 and then scaled to 1, while the highest-weightedrelay’s voting weight would be rounded to 219 = 524288and scaled to 64. Figure 7 summarizes the impact ofthese rounding schemes on the anonymity of weightsums: overall, most relay subsets have high subset en-tropy (defined as SE(R) = log2 |{S ⊆ [n] :

∑i∈S wi =∑

i∈R wi}|), as seen in Figure 7a. As Figure 7b shows,however, both rounding schemes significantly reducethe number of subsets with low subset entropy, withRounded-Pow2 producing the fewest low-entropy sub-sets.

We note that in the worst case, the Rounded-Pow2scheme may inflate the fraction of voting weight con-trolled by an adversary by nearly a factor of 2, whereas

5 To count possible subset sums, we modified the standardpseudopolynomial-time dynamic programming algorithm forsubset sum, which runs in time O(mS) for m items with to-tal weight S

the worst case for the Rounded-1K scheme is 1+ 12vmin+1 ,

where vmin ≥ 1 is the lowest post-rounding votingweight among measuring relays; in our example vmin =8, so the worst-case inflation factor as a result of round-ing was 1.059. If we instead consider an adversary thatcompromises existing relays in our example, the highestrelative inflation encountered under the Rounded-Pow2scheme was 1.366, while the highest relative inflationunder the Rounded-1K scheme was 1.014; the respec-tive 90th percentiles were 1.287 and 1.014.

B Proofs of TheoremsLemma 1. If α < λ, then over the random noise values

E[ρA] ≤ ((1− λ− α)µ)−1 E[max(S1, β

A + S2)].

Proof. The Directory Authorities infer the number ofbytes transferred by A as the maximum of inferredbytes observed on the client and destination sides: ρA =max(ρA

c , ρAd ). The inferred number of client-side bytes

is the sum of the inferred number of guard-observedand middle-client-side bytes: ρA

c = ρ̄Ag + ρ̄A

mc. The in-ferred number of destination-side bytes is defined sim-ilarly: ρA

d = ρ̄Ae + ρ̄A

md. Let βAp be the number of bytes

sent or received by A with a measuring relay in positionp ∈ {g,mc,md, e}. We will show a bound on each ρ̄A

p interms of βA

p that will combine to bound both ρAc and

ρAd and then ρA as well.ρ̄A

p is determined from the values βRiAp , which are

the number of bytes A sent or received from each mea-suring relay Ri ∈ Mp while Ri was in position p inthe circuit. Each Ri uses βRiA

p to produce an indepen-dent estimate of the total bytes ρRiA

p by first addingrandom noise NRiA

p ∼ Lap(δnoise/εnoise) and then di-viding by its selection probability qRi

p , that is, ρRiAp =

(βRiAp + NRiA

p )/qRip . ρ̄A

p is the aggregate of these esti-mates after trimming the largest and smallest fractionλ by voting weight. Let ip,j be the index of the jthlargest ρ statistic. Let i1p and i2p be the indices of thefirst and last untrimmed statistics, respectively. Let Abe the set of indices of the adversary’s relays. To extendthe measuring relay notation to directional positions,letMmc =Mmd =Mm. We bound the value of ρ̄A

p as


follows:

ρ̄Ap =

∑i1

p≤j≤i2pρ

Rip,jA

p qRip,jp∑

i1p≤j≤i2

pq

Rip,jp

(1)

=

∑i1

p≤j≤i2pβ

Rip,jA

p +NRip,j

Ap∑

i1p≤j≤i2

pq

Rip,jp

(2)

≤

∑i1

p≤j∧j /∈A βRip,j

Ap +N

Rip,jA

p∑i1

p≤j∧j /∈A qRip,jp

(3)

≤βA

p +∑

i1p≤j∧j /∈AN

Rip,jA

p

(1− λ− α)µ (4)

The inequality in line 3 holds because j > j′ impliesthat ρ

Rip,jA

p ≥ ρRi

p,j′A

p and also any untrimmed votingweight from adversarial relays is replaced by the sameamount of voting weight from honest relays with largerρ statistics. The inequality in line 4 holds because thevoting weight of honest relays with ρ statistics abovethe lower trimmed fraction is at least 1−λ−α, and vot-ing weights are updated to maintain that the selectionprobability of any subset with voting weight at least1− 2λ has total selection probability of at least µ of itsvoting weight.

Let Sp =∑

i1p≤j∧j /∈AN

Rip,jA

p . Then, using ourbound on ρ̄A

p , we can see that

ρAc = ρ̄A

g + ρ̄Amc (5)

≤βA

g + βAmc + Sg + Smc

(1− λ− α)µ . (6)

By similar arguments, an analogous bound holds for ρAd .

ρA is the maximum of ρAc and ρA

d . Thus, by the pre-ceding inequalities on those values (e.g. Inequality 6),

E[ρA] = E[max(ρAc , ρ

Ad )]

≤ E[

1(1− λ− α)µ max

(βA

g + βAmc + Sg + Smc,

βAe + βA

md + Se + Smd

) ]. (7)

This expectation is over all possible values of the noisevariables sums (i.e. the Sp). Moreover, each noise vari-able is symmetric. Therefore, if we add an additionalindependent noise variable into one of those sums, forevery value of the added variable that decreases themaximum there is an equally-probable value of theadded variable that increases it by the same amount(the reverse is not true because we are taking themaximum with another value). Therefore, adding in

additional independent noise variables only increasesthe expectation on the right-hand side of Inequality 7.Let nmsr = max(|Mg| + |Mm|, |Me| + |Mm|), andlet Sj =

∑1≤i≤nmsr

N ji , j ∈ {1, 2}, where each N j

i

is an independent random variable with distributionLap(δnoise/εnoise)). Then, by the foregoing argument,

E[ρA] ≤ E[((1− λ− α)µ)−1 max

(βA

g + βAmc + S1,

βAe + βA

md + S2)]. (8)

For a fixed bandwidth budget βA, the right-handside of inequality 8 can only be increased by increasingthe larger of βA

g +βAmc and βA

e +βAmd and decreasing the

smaller. The follows from the fact that S1 and S2 areindependently and identically distributed, and so everycase in which shifting the βA

p in this way reduces themaximum reduced corresponds to a unique and equally-probably case in which the maximum is increased by anequal or larger amount (simply swap the values of S1and S2. Therefore, for fixed βA, the right-hand side ofInequality 8 is maximized by setting βA

g + βAmc to βA

and βAg + βA

mc to 0 (vice versa will maximize it as well).Therefore,

E[ρA] ≤ 1(1− λ− α)µE

[max(S1, β

A + S2)],

which was the statement to be proved.

Lemma 2. If α < λ, then, for all R /∈ A, ρR ≥ γ′βR/2.

Proof. We can assume that the adversarial relays A ∈ Asend values ρRA

p that are lower than all other valuesρRR′

p , because doing so can only reduce ρR. Moreover,α < λ, and so we can assume the set of measurementsin the trimmed sum are from some set of honest re-lays of voting weight 1 − 2λ. The aggregate of thesemeasurements in a given position is at least γ′βR

p . Bythe requirement that a relay will not act in both theguard and exit positions during a measurement period,R sends all bytes from either a client-side or destination-side position. Furthermore, R sends at least as manybytes as it receives as it never drops data. Therefore,either βR

g + βRmc or βR

e + βRmd is at least βR/2, and so

ρR = max(ρ̄Rg + ρ̄R

mc, ρ̄Re + ρ̄R

md)≥ γ′max(βR

g + βRmc, β

Re + βR

md)≥ γ′βR/2.

To prove Theorem 1, we first quantify as νloss the lossin accuracy of ρR due taking the maximum of noisy


values, which is limited to εloss (see § 5.4). νloss is definedas the smallest value such that for all β large enoughthat E[max(S1, β+S2)]/((1−2λ)µ) ≥ (1−εdec)ρA

0 tA/tA0 ,

β/E[max(S1, β + S2)] ≥ 1− νloss.

Theorem 1. For α < λ and over the random noisevalues, if βA < (1−λ−α)(1− εdec)(1− εloss)µρA

0 tA/tA0 ,

then E[ρA] puts A into probation, and otherwise

E[ωA]∑R ω

R≤[

2γ′(1 + εinc)(1− εloss)(1− λ− α)µ

](βA/tA)∑R(βR/tR)

.

Proof. If βA < (1− λ− α)(1− εdec)(1− εloss)µρA0 t

A/tA0 ,then, by Lemma 1 and the definition of εloss, E[ρA] <(1 − εdec)ρA

0 tA/tA0 . ωA is adjusted to keep ρA no less

than (1 − εdec)ηR, and thus the expected value of ρA

would put A into probation.Otherwise, it must be that βA/E[max(S1, β

A +S2)] ≥ 1 − εloss by the definition of εloss. Therefore, byLemma 1, E[ρA] ≤ βA/((1− εloss)(1−λ−α)µ). Then wecan use Lemma 2 and the fact that ωA is set to at most(1 + εinc)ρA/tA to obtain the theorem.

aaron johnson*, rob jansen, nicholas hopper, aaron segal, and … · 2017. 9. 12. · aaron...

Documents