an analysis of anonymity in the bitcoin system - arxiv · 2018-10-31 · an analysis of anonymity...

13
An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin, Ireland [email protected] Martin Harrigan Clique Research Cluster University College Dublin, Ireland [email protected] Abstract—Anonymity in Bitcoin, a peer-to-peer electronic currency system, is a complicated issue. Within the system, users are identified by public-keys only. An attacker wishing to de-anonymize its users will attempt to construct the one- to-many mapping between users and public-keys and associate information external to the system with the users. Bitcoin frustrates this attack by storing the mapping of a user to his or her public-keys on that user’s node only and by allowing each user to generate as many public-keys as required. In this paper we consider the topological structure of two networks derived from Bitcoin’s public transaction history. We show that the two networks have a non-trivial topological structure, provide complementary views of the Bitcoin system and have implications for anonymity. We combine these structures with external information and techniques such as context discovery and flow analysis to investigate an alleged theft of Bitcoins, which, at the time of the theft, had a market value of approximately half a million U.S. dollars. I. I NTRODUCTION Bitcoin is a peer-to-peer electronic currency system first described in a paper by Satoshi Nakamoto (probably a pseudonym) in 2008 [1]. It relies on digital signatures to prove ownership and a public history of transactions to prevent double-spending. The history of transactions is shared using a peer-to-peer network and is agreed upon using a proof-of-work system [2], [3]. The first Bitcoins were transacted in January 2009 and by June 2011 there were 6.5 million Bitcoins in circulation among an estimated 10,000 users [4]. In recent months, the currency has seen rapid growth in both media attention and market price relative to existing currencies. It has featured in both The Economist and Forbes magazine. At its peak, a single Bitcoin traded for more than US$30 on popular Bitcoin exchanges. At the same time, U.S. Senators and lobby groups in Germany, such as Der Bundesverband Digitale Wirtschaft (BVWD) or the Federal Association of Digital Economy, have raised concerns regarding the untraceability of Bitcoins and their potential to harm society through tax evasion, money laundering and illegal transactions. The implications of the de- centralized nature of Bitcoin for authorities’ ability to regulate and monitor the flow of currency is as yet unclear. Many users adopt Bitcoin for political and philosophical reasons, as much as pragmatic ones. While there is an under- standing amongst Bitcoin’s technical users that anonymity is not a prominent design goal of the system, we believe that this awareness is not shared throughout the community. For exam- ple, WikiLeaks, an international organization for anonymous whistleblowers, recently advised its Twitter followers that it now accepts anonymous donations via Bitcoin (see Fig. 1) and states that 1 : “Bitcoin is a secure and anonymous digital currency. Bitcoins cannot be easily tracked back to you, and are a [sic] safer and faster alternative to other dona- tion methods.” They proceed to describe a more secure method of donating Bitcoins that involves the generation of a one-time public-key but the implications for those who donate using the tweeted public-key are unclear. Is it possible to associate a donation with other Bitcoin transactions performed by the same user or perhaps identify them using external information? At present, there is little detailed work on Bitcoin anonymity in the public domain – the extent to which this anonymity holds in the face of determined analysis remains to be tested. Fig. 1. Screen capture of a tweet from WikiLeaks announcing their acceptance of ‘anonymous Bitcoin donations’. This paper is organized as follows. In Sect. II we con- sider some existing work relating to electronic currencies and anonymity. The economic aspects of the system, interesting in their own right, are beyond the scope of this work. In Sect. III we present an overview of the Bitcoin system; we focus on three features that are particularly relevant to our analysis. In Sect. IV we construct two network structures, the transaction network and the user network using the publicly available transaction history. We study the static and dynamic properties of these networks. In Sect. V we consider the implications of these network structures for anonymity. We also combine information external to the Bitcoin system with techniques such as flow and temporal analysis to illustrate how various types of information leakage can contribute to the de- anonymization of the system’s users. Finally, we conclude in Sect. VI. 1 http://wikileaks.org/support.html – Retrieved: 22-07-2011 arXiv:1107.4524v1 [physics.soc-ph] 22 Jul 2011

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

An Analysis of Anonymity in the Bitcoin SystemFergal Reid

Clique Research ClusterUniversity College Dublin, Ireland

[email protected]

Martin HarriganClique Research Cluster

University College Dublin, [email protected]

Abstract—Anonymity in Bitcoin, a peer-to-peer electroniccurrency system, is a complicated issue. Within the system,users are identified by public-keys only. An attacker wishingto de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associateinformation external to the system with the users. Bitcoinfrustrates this attack by storing the mapping of a user to hisor her public-keys on that user’s node only and by allowingeach user to generate as many public-keys as required. In thispaper we consider the topological structure of two networksderived from Bitcoin’s public transaction history. We showthat the two networks have a non-trivial topological structure,provide complementary views of the Bitcoin system and haveimplications for anonymity. We combine these structures withexternal information and techniques such as context discoveryand flow analysis to investigate an alleged theft of Bitcoins, which,at the time of the theft, had a market value of approximatelyhalf a million U.S. dollars.

I. INTRODUCTION

Bitcoin is a peer-to-peer electronic currency system firstdescribed in a paper by Satoshi Nakamoto (probably apseudonym) in 2008 [1]. It relies on digital signatures toprove ownership and a public history of transactions to preventdouble-spending. The history of transactions is shared using apeer-to-peer network and is agreed upon using a proof-of-worksystem [2], [3].

The first Bitcoins were transacted in January 2009 andby June 2011 there were 6.5 million Bitcoins in circulationamong an estimated 10,000 users [4]. In recent months, thecurrency has seen rapid growth in both media attention andmarket price relative to existing currencies. It has featuredin both The Economist and Forbes magazine. At its peak, asingle Bitcoin traded for more than US$30 on popular Bitcoinexchanges. At the same time, U.S. Senators and lobby groupsin Germany, such as Der Bundesverband Digitale Wirtschaft(BVWD) or the Federal Association of Digital Economy, haveraised concerns regarding the untraceability of Bitcoins andtheir potential to harm society through tax evasion, moneylaundering and illegal transactions. The implications of the de-centralized nature of Bitcoin for authorities’ ability to regulateand monitor the flow of currency is as yet unclear.

Many users adopt Bitcoin for political and philosophicalreasons, as much as pragmatic ones. While there is an under-standing amongst Bitcoin’s technical users that anonymity isnot a prominent design goal of the system, we believe that thisawareness is not shared throughout the community. For exam-ple, WikiLeaks, an international organization for anonymous

whistleblowers, recently advised its Twitter followers that itnow accepts anonymous donations via Bitcoin (see Fig. 1)and states that1:

“Bitcoin is a secure and anonymous digital currency.Bitcoins cannot be easily tracked back to you, andare a [sic] safer and faster alternative to other dona-tion methods.”

They proceed to describe a more secure method of donatingBitcoins that involves the generation of a one-time public-keybut the implications for those who donate using the tweetedpublic-key are unclear. Is it possible to associate a donationwith other Bitcoin transactions performed by the same user orperhaps identify them using external information? At present,there is little detailed work on Bitcoin anonymity in the publicdomain – the extent to which this anonymity holds in the faceof determined analysis remains to be tested.

Fig. 1. Screen capture of a tweet from WikiLeaks announcing theiracceptance of ‘anonymous Bitcoin donations’.

This paper is organized as follows. In Sect. II we con-sider some existing work relating to electronic currencies andanonymity. The economic aspects of the system, interestingin their own right, are beyond the scope of this work. InSect. III we present an overview of the Bitcoin system; wefocus on three features that are particularly relevant to ouranalysis. In Sect. IV we construct two network structures, thetransaction network and the user network using the publiclyavailable transaction history. We study the static and dynamicproperties of these networks. In Sect. V we consider theimplications of these network structures for anonymity. Wealso combine information external to the Bitcoin system withtechniques such as flow and temporal analysis to illustrate howvarious types of information leakage can contribute to the de-anonymization of the system’s users. Finally, we conclude inSect. VI.

1http://wikileaks.org/support.html – Retrieved: 22-07-2011

arX

iv:1

107.

4524

v1 [

phys

ics.

soc-

ph]

22

Jul 2

011

Page 2: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

A. A Note Regarding Motivation and Disclosure

Our motivation for this analysis is not to de-anonymize indi-vidual users of the Bitcoin system. Rather, it is to demonstrate,using a passive analysis of a publicly available dataset, theinherent limits of anonymity when using Bitcoin. This willensure that users do not have expectations that are not beingfulfilled by the system.

In security-related research, there is considerable tensionover how best to disclose vulnerabilities [5]. Many researchersfavor full disclosure where all information regarding a vulner-ability is promptly released. This enables informed users topromptly take defensive measures. Other researchers favor lim-ited disclosure; while this provides attackers with a window inwhich to exploit uninformed users, a mitigation strategy can beprepared and implemented before public announcement, thuslimiting damage, e.g. through a software update. Our analysisdoes not show a vulnerability of the Bitcoin system per se, butdoes illustrate some potential risks and pitfalls with regard toanonymity. However, there is no central authority which canfundamentally change the system’s behavior. Furthermore, itis not possible to mitigate analysis of the existing transactionhistory.

There are also two noteworthy features of the dataset whencompared with, say, contentious social network datasets, e.g.the Facebook profiles of Harvard University students [6].Firstly, the delineation between what is considered public andprivate is clear: the entire history of Bitcoin transactions ispublicly available. Secondly, the Bitcoin system does not havea usage policy. After joining Bitcoin’s peer-to-peer network, aclient can freely request the entire history of Bitcoin transac-tions; there is no crawling or scraping required.

Thus, we believe the best strategy to minimise the threatto user anonymity is to be descriptive about the risks of theBitcoin system. We do not identify individual users – apartfrom those in the case study – but we note that it is notdifficult for other groups to replicate our work. Indeed, giventhe passive nature of the analysis, other parties may alreadybe conducting similar analyses.

II. RELATED WORK

The related work for this paper can be categorized into twofields: electronic currencies and anonymity.

A. Electronic Currencies

Electronic currencies can be technically classified accordingto their mechanisms for establishing ownership, protectingagainst double-spending, ensuring anonymity and/or privacy,and generating and issuing new currency. Bitcoin is particu-larly noteworthy for the last of these mechanisms. The proof-of-work system [2], [3] that establishes consensus regardingthe history of transactions also doubles as a minting mech-anism. The scheme was first outlined in the B-Money Pro-posal [7]. We briefly consider some alternative mechanisms.Ripple [8] is an electronic currency where every user can issuecurrency. However, the currency is only accepted by peers whotrust the issuer. Transactions between arbitrary pairs of users

require chains of trusted intermediaries between the users.Saito [9] formalized and implemented a similar system, i-WAT,in which the the chain of intermediaries can be establishedwithout their immediate presence using digital signatures.KARMA [10] is an electronic currency where the centralauthority is distributed over a set of users that are involved inall transactions. PPay [11] is a micropayment scheme for peer-to-peer systems where the issuer of the currency is responsiblefor keeping track of it. However, both KARMA and PPay mayincur a large overhead when the rate of transactions is high.Mondex is a smart-card electronic currency [12]. It preserves acentral bank’s role in the generation and issuance of electroniccurrency. Mondex was an electronic replacement for cash inthe physical world whereas Bitcoin is an electronic analog ofcash in the online world.

The authors are not aware of any studies of the net-work structure of electronic currencies. However, there aresuch studies of physical currencies. The community currencyTomamae-cho was introduced into the Hokkaido Prefecture inJapan for a three-month period during 2004–05 in a bid torevitalize local economy. The Tomamae-cho system involvedgift-certificates that were re-usable and legally redeemable intoyen. There was an entry space on the reverse of each certificatefor recipients to record transaction dates, their names andaddresses, and the purposes of use, up to a maximum offive recipients. Kichiji and Nishibe [13] used the collectedcertificates to derive a network structure that represented theflow of currency during the period. They showed that thecumulative degree distribution of the network obeyed a power-law distribution, the network had small-world properties (theaverage clustering coefficient was high whereas the averagepath length was low), the directionality and the value oftransactions were significant features, and the double-trianglesystem [14] was effective. There also exist studies of thephysical movement of currency: ‘Where’s George?’ [15] isa crowd-sourced method for tracking U.S. dollar bills whereusers record the serial numbers of bills in their possession,along with their current location. If a bill is recorded suffi-ciently often, its geographical movement can be tracked overtime. Brockmann et al. [16] used this dataset as a proxyfor studying multi-scale human mobility and as a tool forcomputing geographic borders inherent to human mobility.

B. Anonymity

Previous work has shown the difficulty of maintaininganonymity in the context of networked data and online serviceswhich expose partial user information. Backstrom et al. [17]considered privacy attacks which identify users using the struc-ture of the network around them and discussed the difficultyof guaranteeing user anonymity in the presence of networkdata. Crandall et at. [18] infer social ties between userswhere none are explicitly stated by looking at patterns of ‘co-incidences’ or common off-network co-occurences. Narayananand Shmatikov [19] de-anonymized the Netflix Prize dataset

Page 3: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

using information from IMDB2 which had similar user con-tent, showing that statistical matching between different butrelated datasets can be used to attack anonymity. Puzis etal. [20] simulated the monitoring of a communications net-work using strategically-located monitoring nodes and showthat, using real-world network topologies, a relatively smallnumber of nodes can collaborate to pose a significant threatto anonymity. All of this work points to the difficulty inmaintaining anonymity where network data on user behaviouris available and illustrates how seemingly minor informationleakages can be aggregated to pose significant risks.

III. THE BITCOIN SYSTEM

The following is a simplified description of the Bitcoinsystem; see Nakamoto [1] for a more thorough treatment.Bitcoin is an electronic currency with no central authority orissuer. There is no central bank or fractional reserve systemcontrolling the supply of Bitcoins. Instead, they are generatedat a predictable rate such that the eventual total number willbe 21 million. There is no requirement for a trusted third-party when making transactions. Suppose Alice wishes to‘send’ a number of Bitcoins to Bob. Alice uses a Bitcoinclient to join the Bitcoin peer-to-peer network and makesa public transaction or declaration stating that one or moreidentities that she controls (which can be verified using public-key cryptography), and which previously had a number ofBitcoins assigned to them, wish to re-assign those Bitcoins toone or more other identities, at least one of which is controlledby Bob. The participants of the peer-to-peer network form acollective consensus regarding the validity of this transactionby appending it to the public history of previously agreed-upontransactions (the longest block-chain). This process, known asmining, involves the repeated computation of a cryptographichash function so that the digest of the transaction, alongwith other pending transactions, and an arbitrary nonce, has aspecific form. This process is designed to require considerablecomputational effort, from which the security of the Bitcoinmechanism is derived. To encourage users to pay this compu-tational cost, the process is incentivized using newly generatedBitcoins and/or transaction fees.

In this paper, there are three features of the Bitcoin systemthat are of particular interest. Firstly, the entire history ofBitcoin transactions is publicly available. This is necessary inorder to validate transactions and prevent double-spending inthe absence of a central authority. The only way to confirm theabsence of a previous transaction is to be aware of all previoustransactions. The second feature of interest is that a transactioncan have multiple inputs and multiple outputs. An input to atransaction is either the output of a previous transaction ora sum of newly generated Bitcoins and transaction fees. Atransaction frequently has either a single input from a previouslarger transaction or multiple inputs from previous smallertransactions. Also, a transaction frequently has two outputs:one sending payment and one returning change. Thirdly, the

2http://www.imdb.com

payer and payee(s) of a transaction are identified throughpublic-keys from public-private key-pairs. However, a usercan have multiple public-keys. In fact, it is considered goodpractice for a payee to generate a new public-private key-pair for every transaction. Furthermore, a user can take thefollowing steps to better protect their identity: they can avoidrevealing any identifying information in connection with theirpublic-keys; they can repeatedly send varying fractions oftheir Bitcoins to themselves using multiple (newly generated)public-keys; and/or they can use a trusted third-party mixer orlaundry. However, these practices are not universally applied.

The three features above, namely the public availabilityof Bitcoin transactions, the input-output relationship betweentransactions and the re-use and co-use of public-keys, providea basis for two distinct network structures: the transactionnetwork and the user network. The transaction network rep-resents the flow of Bitcoins between transactions over time.Each vertex represents a transaction and each directed edgebetween a source and a target represents an output of thetransaction corresponding to the source that is an input to thetransaction corresponding to the target. Each directed edgealso includes a value in Bitcoins and a timestamp. The usernetwork represents the flow of Bitcoins between users overtime. Each vertex represents a user and each directed edgebetween a source and a target represents an input-output pair ofa single transaction where the input’s public-key belongs to theuser corresponding to the source and the output’s public-keybelongs to the user corresponding to the target. Each directededge also includes a value in Bitcoins and a timestamp.

We gathered the entire history of Bitcoin transactions fromthe first transaction on the 3rd January 2009 up to andincluding the last transaction that occurred on the 12th July2011. We gathered the dataset using the Bitcoin client3 anda modified version of Gavin Andresen’s bitcointools.4 Thedataset comprises 1 019 486 transactions between 1 253 054unique public-keys. We describe the construction of the cor-responding transaction and user networks and their analysesin the following sections. We will show that the two networksare complex, have a non-trivial topological structure, providecomplementary views of the Bitcoin system and have impli-cations for the anonymity of users.

IV. THE TRANSACTION AND USER NETWORKS

A. The Transaction Network

The transaction network T represents the flow of Bitcoinsbetween transactions over time. Each vertex represents atransaction and each directed edge between a source and atarget represents an output of the transaction corresponding tothe source that is an input to the transaction corresponding tothe target. Each directed edge also includes a value in Bitcoinsand a timestamp. It is a straight-forward task to construct Tfrom our dataset.

3http://www.bitcoin.org4http://github.com/gavinandresen/bitcointools

Page 4: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

1.2 BTC

01/05/2011 14:13:26

... t4 has 12 other

inputs not shown here

1.32 BTC14:10:54 05/05/2011

0.12 BTC13:12:19 05/05/2011

t1

t2

t3 t4

Fig. 2. An example sub-network from the transaction network. Eachrectangular vertex represents a transaction and each directed edge representsa flow of Bitcoins from an output of one transaction to an input of another.

Figure 2 shows an example sub-network of T . t1 is atransaction with one input and two outputs.5 It was addedto the block-chain on the 1st May 2011. One of its outputsassigned 1.2 BTC (Bitcoins) to a user identified by the public-key pk1.6 The public-keys are not shown in Fig. 2. Similarly,t2 is a transaction with two inputs and two outputs.7 It wasaccepted on the 5th May 2011. One of its outputs sent 0.12BTC to a user identified by a different public-key, pk2.8 t3is a transaction with two inputs and and one output.9 It wasaccepted on the 5th May 2011. Both of its inputs are connectedto the two aforementioned outputs of t1 and t2. The onlyoutput of t3 was redeemed by t4.10

T has 974 520 vertices and 1 558 854 directed edges. Thenumber of vertices is less than the total number of transactionsin the dataset because we omit transactions that are notconnected to at least one other transaction. These correspondto newly generated Bitcoins and transactions fees that are notyet redeemed. The network has neither multi-edges (multipleedges between the same pair of vertices in the same direction)nor loops. It is a directed acyclic graph (DAG) since theoutput of a transaction can never be an input (either directlyor indirectly) to the same transaction.

Figure 3(a) shows a log-log plot of the cumulative degreedistributions: the solid red curve is the cumulative degreedistribution (in- and out-degree); the dashed green curve is thecumulative in-degree distribution; and the dotted blue curveis the cumulative out-degree distribution. We fitted power-law distributions, p(x) ∼ x−α for x > xmin, to the threedistributions by estimating the parameters α and xmin using

5The transactions and public-keys used in our examples existin our dataset. The unique identifier for the transaction t1 is09441d3c52fa0018365fcd2949925182f6307322138773d52c201f5cc2bb5976.You can query the details of a transaction or public-key by examiningBitcoin’s longest block-chain using, say, the Bitcoin Block Explorer(http://www.blockexplorer.com).

613eBhR3oHFD5wkE4oGtrLdbdi2PvK3ijMC70c4d41d0f5d2aff14d449daa550c7d9b0eaaf35d81ee5e6e77f8948b14d62378819smBSUoRGmbH13vif1Nu17S63Tnmg7h9n90c034fb964257ecbf4eb953e2362e165dea9c1d008032bc9ece5cebbc7cd469710f16ece066f6e4cf92d9a72eb1359d8401602a23990990cb84498cdbb93026402

a goodness-of-fit method [21]. Table I shows the estimatesalong with the corresponding Kolmogorov–Smirnov goodness-of-fit (GoF) statistics and p-values. We observe that none ofthe distributions for which the empirically-best scaling regionis non-trivial have a power-law as a plausible hypothesis(p > 0.1). This is likely due to the fact that there is nopreferential attachment [22], [23]: new vertices are joined toexisting vertices whose corresponding transactions are not yetfully redeemed.

There are 1 949 (maximal weakly) connected componentsin the network. Fig. 3(b) shows a log-log plot of the cumu-lative component size distribution. There are 948 287 vertices(97.31%) in the giant component. This component also con-tains a giant biconnected component with 716 354 vertices(75.54% of the vertices in the giant component).

Variable x x s α xmin GoF p-val.Degree 3 3.20 6.20 3.24 50 0.02 0.05In-Degree 1 1.60 5.31 2.50 4 0.01 0.00Out-Degree 1 1.60 3.17 3.50 51 0.05 0.00

TABLE ITHE DEGREE, IN-DEGREE AND OUT-DEGREE DISTRIBUTIONS OF T .

We also performed a rudimentary dynamic analysis of thenetwork. Figures 3(c), 3(d) and 3(e) show the edge number,density and average path length of the transaction networkon a monthly basis. These measurements are not cumulative.The network’s growth and sparsification are evident. We alsoobserve some anomalies in the average path length during Julyand November 2010.

B. The User Network

The user network U represents the flow of Bitcoins betweenusers over time. Each vertex represents a user and eachdirected edge between a source and a target represents aninput-output pair of a single transaction where the input’spublic-key belongs to the user corresponding to the source andthe output’s public-key belongs to the user corresponding tothe target. Each directed edge also includes a value in Bitcoinsand a timestamp.

We need to perform a preprocessing step before we canconstruct U from our dataset. Suppose U is, at first, imperfectin the sense that each vertex represents a single public-keyrather than a user and that each directed edge between asource and a target represents an input-output pair of a singletransaction, where the input’s public-key corresponds to thesource and the output’s public-key corresponds to the target.In order to perfect this network, we need to contract eachsubset of vertices whose corresponding public-keys belong toa single user. The difficulty is that public-keys are Bitcoin’smechanism for ensuring anonymity: ‘the public can see thatsomeone [identified by a public-key] is sending an amount tosomeone else [identified by another public-key], but withoutinformation linking the transaction to anyone.’ [1]. In fact, it isconsidered good practice for a payee to generate a new public-private key-pair for every transaction to keep transactions from

Page 5: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

(a) A log-log plot of the cumulative degree distributions. (b) A log-log plot of the cumulative component size distribution.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Edge Number of Transaction Network

0e+00

1e+05

2e+05

3e+05

4e+05

5e+05

(c) A temporal histogram showing the number ofedges per month.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Density of Transaction Network

0.000

0.001

0.002

0.003

0.004

0.005

(d) A temporal histogram showing the density permonth.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Average Path Length of Transaction Network

0

500

1000

1500

2000

2500

3000

(e) A temporal histogram showing the averagepath length per month.

Fig. 3. The degree distributions, component size distribution and monthly edge number, density and average path length of the transaction network.

being linked to a common owner. Therefore, it is impossibleto completely perfect the network using our dataset alone.However, as noted by Nakamoto [1],

“Some linking is still unavoidable with multi-inputtransactions, which necessarily reveal that their in-puts were owned by the same owner. The risk isthat if the owner of a key is revealed, linking couldreveal other transactions that belonged to the sameowner.”

We will use this property of transactions with multipleinputs to contract subsets of vertices in the imperfect net-work. We construct an ancillary network where each vertexrepresents a public-key and each undirected edge between twoend points represents a pair of inputs of a single transactionwhose public-keys correspond to the end points. Using ourdataset, this network has 1 253 054 vertices (unique public-keys) and 4 929 950 edges. More importantly, it has 86 641

non-trivial maximal connected components. We deduce thateach maximal connected component corresponds to a user andeach component’s constituent vertices correspond to that user’spublic-keys.

Figure 5 shows an example sub-network of the imperfectnetwork overlaid onto the example sub-network of T fromFig. 2. The outputs of t1 and t2 that were eventually redeemedby t3 were sent to a user whose public-key was pk1 and auser whose public-key was pk2 respectively. Figure 6 showsan example sub-network of the user network overlaid onto theexample sub-network of the imperfect network from Fig. 5.pk1 and pk2 are contracted into a single vertex u1 since theycorrespond to a pair inputs of a single transaction. In otherwords, they are in the same maximal connected componentof the ancillary network (see the vertices representing pk1and pk2 in the dashed grey box in Fig. 6). A single userowns both public-keys. We note that the maximal connected

Page 6: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

(a) A log-log plot of the cumulative degree distributions. (b) A log-log plot of the cumulative component size distribution.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Edge Number of User Network

0e+00

1e+05

2e+05

3e+05

4e+05

5e+05

6e+05

7e+05

(c) A temporal histogram showing the number ofedges per month.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Density of User Network

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(d) A temporal histogram showing the density permonth.

2009

−01

2009

−02

2009

−03

2009

−04

2009

−05

2009

−06

2009

−07

2009

−08

2009

−09

2009

−10

2009

−11

2009

−12

2010

−01

2010

−02

2010

−03

2010

−04

2010

−05

2010

−06

2010

−07

2010

−08

2010

−09

2010

−10

2010

−11

2010

−12

2011

−01

2011

−02

2011

−03

2011

−04

2011

−05

2011

−06

Average Path Length of User Network

0

5

10

15

(e) A temporal histogram showing the averagepath length per month.

Fig. 4. The degree distributions, component size distribution and monthly edge number, density and average path length of the user network.

t1

t2

t3 t4pk2

pk1

Fig. 5. An example sub-network from the imperfect network. Each diamondvertex represents a public-key and each directed edge between diamondvertices represents a flow of Bitcoins from one public-key to another.

component in this case is not simply a clique; it has a diameterof four indicating that there are at least two public-keysbelonging to that same user that are connected indirectly viathree transactions. The sixteen inputs to transaction t4 resultin the contraction of a further sixteen public-keys into a singlevertex u2. The value and timestamp of the flow of Bitcoinsfrom u1 to u2 is derived from the transaction network.

After the preprocessing step, U has 881 678 vertices (86 641non-trivial maximal connected components and 795 037 iso-lated vertices in the ancillary network) and 1 961 636 directededges. The network is still imperfect. We have not contractedall possible vertices but it will suffice for our present analysis.Unlike T , U has multi-edges, loops and directed cycles.

Figure 4(a) shows a log-log plot of the network’s cumulativedegree distributions. We fitted power-law distributions to thethree distributions and calculated their goodness-of-fit andstatistical significance as in the previous section. Table II

Page 7: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

pk2

pk1

u1

u2

1.32 BTC

14:10:54

05/05/20

11

pk2pk1

Fig. 6. An example sub-network from the user network. Each circular vertexrepresents a user and each directed edge between circular vertices represents aflow of Bitcoins from one user to another. The maximal connected componentfrom the ancillary network that corresponds to the vertex u1 is shown withinthe dashed grey box.

shows the results. We observe that none of the distributionshave a power-law as a plausible hypothesis.

There are 604 (maximal) weakly connected componentsand 579 355 (maximal) strongly connected components in thenetwork; Fig. 4(b) shows a log-log plot of the cumulativecomponent size distribution for both variations. There are879 859 vertices (99.79%) in the giant weakly connectedcomponent. This component also contains a giant weaklybiconnected component with 652 892 vertices (74.20% of thevertices in the giant component).

Variable x x s α xmin GoF p-val.Degree 3 4.45 218.10 2.38 66 0.02 0.00In-Degree 1 2.22 86.40 2.45 57 0.05 0.00Out-Degree 2 2.22 183.91 2.03 10 0.22 0.00

TABLE IITHE DEGREE, IN-DEGREE AND OUT-DEGREE DISTRIBUTIONS OF U .

Our dynamic analysis of the user network mirrors that of thetransaction network in the previous subsection. Figures 4(c),4(d) and 4(e) show the edge number, density and averagepath length of the user network on a monthly basis. Thesemeasurements are not cumulative. The network’s growth andsparsification are evident. We note that even though ourdynamic analysis of the user network is on a monthly basis, thepreprocessing step is performed using the ancillary network ofthe entire imperfect network. This enables us to resolve public-keys to a single user irrespective of the month is which thelinking tranactions occur.

V. ANONYMITY ANALYSIS

Prior to performing the analyses above, we expected the usernetwork to be largely composed of disjoint trees representingBitcoin flows between one-time public-keys that were notlinked with other public-keys. However, our analyses reveal

that the user network has considerable cyclic structure. Wenow consider the implications of this structure, coupled withother aspects of the Bitcoin system, for anonymity.

There are several ways in which the user network canbe used to deduce information about Bitcoin users. We canuse global network properties, such as degree distribution, toidentify outliers, e.g. users with unusually high activity in atime-window following an event of interest. We can use localnetwork properties to examine the context in which a useroperates by observing the egocentric network of a particularuser and the users with which he or she interacts with eitherdirectly or indirectly. The dynamic nature of the user networkalso enables us to perform flow and temporal analyses. We canexamine the significant Bitcoin flows between groups of usersover time. We will now discuss each of these threats in moredetail, and provide a case study to demonstrate their potential.

A. Integrating Off-Network Information

There is no user directory for the Bitcoin system. However,we can attempt to build a partial user directory associatingBitcoin users (and their known public-keys) with off-networkinformation. If we can make sufficient associations and com-bine them with the network structures above, a potentiallyserious threat to anonymity emerges.

Many organizations and services such as on-line stores thataccept Bitcoinis, exchanges, laundry services and mixers haveaccess to identifying information regarding their users, e.g.e-mail addresses, shipping addresses, credit card and bankaccount details, IP addresses, etc. If any of this informationwas publicly available, or accessible by, say, law enforcementagencies, then the identities of users involved in related trans-actions may also be at risk. To illustrate this point, we considera number of publicly available data sources and integrate theirinformation with the user network.

1) The Bitcoin Faucet: The Bitcoin Faucet11 is a websitewhere users can donate Bitcoins to be redistribtued in smallamounts to other users. In order to prevent abuse of thisservice, a history of recent give-aways are published alongwith the IP addresses of the recipients. When the BitcoinFaucet does not batch the re-distribution, it is possible toassociate the IP addresses with the recipient’s public-keys.This page can be scraped over time to produce a time-stampedmapping of IP addresses to users.

We found that the public-keys associated with many ofthe IP addresses that received Bitcoins were contracted withother public-keys in the ancillary network, thus revealing IPaddresses that are somehow related to previous transactions.Fig. 7(a) shows a map of geolocated IP addresses belongingto users who received Bitcoins over a period of one week.Fig. 7(b) overlays the user network onto a sample of thoseusers. An edge between two geolocated IP addresses indicatesthat the corresponding users are linked by an undirected pathof length at most three in the user network; the path must notcontain the vertex representing the Bitcoin Faucet itself.

11http://freebitcoins.appspot.com

Page 8: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

These figures serve as a proof-of-concept from a smallpublicly available data source. We note that large centralizedBitcoin service providers are capable of producing much moredetailed maps.

2) Voluntary Disclosures: Another source of identifyinginformation is the voluntary disclosure of public-keys by users,for example, when posting to the Bitcoin forums12. Bitcoinpublic-keys are typically represented as strings approximatelythirty-three characters in length and starting with the digit one.They are indexed very well by popular search engines. Weidentified many high-degree vertices with external informationusing a search engine alone. We proceeded to scrape theBitcoin Forums where users frequently attach a public-key totheir signatures. We also gathered public-keys from Twitterstreams and user-generated public directories. It is importantto note that in many cases we are able to resolve the ‘public’public-keys with other public-keys belonging to the same userusing the ancillary network. We also note that large centralizedBitcoin service providers can do the same with their userinformation.

B. Egocentric Analysis and Visualization of the User Network

There are severals pieces of information we can directlyderive from the user network regarding a particular user. Wecan compute the balance held by a single public-key. Wecan also aggregate the balances belonging to public-keys thatare controlled by a particular user. For example, Fig. 8(a)and Fig. 8(b) show the receipts and payments to and fromWikiLeaks’ public-key in terms of Bitcoin and transactionvolume respectively. The donations are generally small do-nations and are forwarded to other public-keys periodically.There was also a noticeable spike in donations when thefacility was first announced. Figure 8(c) shows the receiptsand payments to and from the creator of a popular Bitcointrading website aggregated over a number of public-keys thatare linked through the ancillary network.

An important advantage of deriving network structures fromthe Bitcoin transaction history is our ability to use networkvisualization and analysis tools to investigate the flow ofBitcoins. For example, Fig. 9 shows the network structuresurrounding the WikiLeaks’ public-key in the imperfect usernetwork. Our tools resolve several of the vertices with iden-tifying information gathered in Sect. V-A. These users canbe linked either directly or indirectly to their donations. Thepresence of a Bitcoin mining pool (a large red vertex) and anumber of public-keys between it and WikiLeaks’ public-keyis interesting.

C. Context Discovery

Given a number of public-keys or users of interest, we canuse network structure and context to better understand theflow of Bitcoins between them. For example, we can examineall shortest paths between a set of vertices or consider themaximum number of Bitcoins that can flow from a source

12http://forum.bitcoin.org

864768

9264

4096

6658

472235

80470

68102

18

411137

780297

140810

384012

587522

4623

23

751773

611795

192529

3726

334520

828650

146229

503227

18237

372

2434

928

187

880147

24634

233606

24215

301221

5821

5329

339

5992

751500

11767

508

359310

628760

684569

39855

466972

509470

693791

356385

297506

154569

267814

189486

70191

832671

459403

86175

290020

376702

283026

447905

737204

344528

921429

57908

436

145418

514111

58434

8259

12356

710582

704070

656967

146510

704591

863825

464995

52821

369238

527959

16987

343644

529502

615521

12387

13412

33893

644200

54890

710253

399470

395889

18036

688245

222134

164472

520832

132738

37825

17032

18576

826297

13458

244372

438264

529050

7708

578218

117419

438956

64686

175

18610

81093

1239

398555

216369

655670

50491

45400

32117

209391

491077

10675

406546

35359

40117

19638

18377

745503

140481

873847

494200

234195

761560

765657

8410

157403

623946

60643

889553

860392

500971

15598

643312

503026

1779

136948

564989

284930

1795

645446

184582

182487

852749

386325

594199

272431

167130

402500

253733

526638

720687

317235

518708

34621

341813

353253

675003

102022

45905

359250

305493

226134

84823638885

364379

5982

914405

171360

15718

681322

77287

622956

447856

235752

463223

448376

28025

152443

8687

214400

18821

797574

803208

59275

503694

782736

631191

43076

237981

864158

124322

534086

502183

431016

74665

14764

522159

829366

750009

660923

304061

88480

402887

215502

35281

385440

50644

364811

400863

8673

36325

56401

373736

713708

82770

890863

101679

141395

106374

614392

55802

822782

Fig. 9. An egocentric visualization of the vertex representing WikiLeaks’public-key in the imperfect user network. The size of a vertex corresponds toits degree in the entire imperfect user network. The color denotes the volumeof Bitcoins – warmer colors have larger volumes flowing through them. Thelarge red vertices represent a Bitcoin mining pool, a centralized Bitcoin walletservice and an unknown entity.

to a destination given the transactions and their ‘capacities’in an interesting time-window. For example, Fig. 10 showsall shortest paths between the vertices representing the userswe identified using off-network information in Sect. V-A andthe vertex that represents the MyBitcoin service13 in the usernetwork. We can identify more than 60% of the users in thisvisualization and deduce many direct and indirect relationshipsbetween them.

Case Study – Part I: We analyse an alleged theft of 25 000BTC reported in the Bitcoin Forums14 by a user known asallinvain. The victim reported that a large portion ofhis Bitcoins were sent to pkred

15 on 13/06/2011 at 16:52:23UTC. The theft occurred shortly after somebody broke into thevictim’s Slush pool account16 and changed the payout addressto pkblue.17. The Bitcoins rightfully belonged to pkgreen.18. Atthe time of the theft, the stolen Bitcoins had a market valueof approximately half a million U.S. dollars. We chose thiscase study to illustrate the potential risks to the anonymity ofa user (the thief) who has good reason to remain anonymous.

We consider the imperfect user network before any con-tractions. We restrict ourselves to the egocentric networksurrounding the thief: we include every vertex that is reachableby a path of length at most two ignoring directionality and all

13http://www.mybitcoin.com14http://forum.bitcoin.org/index.php?topic=16457.0151KPTdMb6p7H3YCwsyFqrEmKGmsHqe1Q3jg16http://http://mining.bitcoin.cz1715iUDqk6nLmav3B1xUHPQivDpfMruVsu9f181J18yk7D353z3gRVcdbS7PV5Q8h5w6oWWG

Page 9: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

(a) A map of geolocated IP addresses associated with users receiving Bitcoinsfrom the Bitcoin Faucet during a one week period.

(b) A map of a sample of the geolocated IP addresses in Fig. 7(a) connectedby edges where the corresponding users are connected by a path of length atmost three in the user network that does not include the vertex representingthe Bitcoin Faucet.

Fig. 7. We can use the Bitcoin Faucet to map users to geolocated IP addresses.

0 5 10 15 20 25 30Day

0

50

100

150

200

250

300

350

400

450

Bitc

oins

Cumulative IncomingCumulative Outgoing

(a) The receipts and payments to and from WikiLeaks’public-key over time.

0 5 10 15 20 25 30Day

0

10

20

30

40

50

60

70

Tran

sact

ions

Outgoing TransactionsIncoming Transactions

(b) The number of transactions involving WikiLeaks’public-key over time.

0 50 100 150 200 250 300 350Day

0

2000

4000

6000

8000

10000

12000

14000

16000

Bitc

oins

Cumulative IncomingCumulative Outgoing

(c) The receipts and payments to and from the creatorof a popular Bitcoin trading website aggregated over anumber of public-keys.

Fig. 8. Plots of the cumulative receipts and payments to and from Bitcoin public-keys and users.

edges induced by these vertices. We also remove all loops,multiple edges and edges that are not contained in somebiconnected component to avoid clutter. In Fig. 11, the redvertex represents the thief who owns the public-key pkred andthe green vertex represents the victim who owns the public-key pkgreen. The theft is the green edge joining the victim tothe thief. There are in fact two green edges located nearby inFig. 11 but only one directly connects the victim to the thief.

Interestingly, the victim and the thief are joined by paths(ignoring directionality) other than the green edge representingthe theft. For example, consider the sub-network shown inFig. 12 induced by the red, green, purple, yellow and orangevertices. This sub-network is a cycle. We contract all verticeswhose corresponding public-keys belong to the same user. Thisallows us to attach values in Bitcoins and timestamps to thedirected edges. We can make a number of observations. Firstly,we note that the theft of 25 000 BTC was preceded by a smallertheft of 1 BTC. This was later reported by the victim in theBitcoin forums. Secondly, using off-network data, we haveidentified some of the other colored vertices: the purple vertexrepresents the main Slush pool account and the orange vertex

represents the computer hacker group known as LulzSec.19

We note that there has been at least one attempt to associatethe thief with LulzSec20. This was a fake; it was created afterthe theft. However, the identification of the orange vertex withLulzSec is genuine and was established before the theft. Weobserve that the thief sent 0.31337 BTC to LulzSec shortlyafter the theft but we cannot otherwise associate him with thegroup. The main Slush pool account sent a total of 441.83BTC to the victim over a 70-day period. It also sent a total of0.2 BTC to the yellow vertex over a two day period. One daybefore the theft, the yellow vertex also sent 0.120607 BTC toLulzSec.

The yellow vertex represents a user who is the owner of atleast five public-keys.21 Like the victim, he is a member ofthe Slush pool, and like the thief, he is a one-time donator

19http://twitter.com/LulzSec/status/7638857683265126520http://pastebin.com/88nGp508211MUpbAY7rjWxvLtUwLkARViqSdzypMgVW4

13tst9ukW294Q7f6zRJr3VmLq6zp1C68EK1DcQvXMD87MaYcFZqHzDZyH3sAv8R5hMZe1AEW9ToWWwKoLFYSsLkPqDyHeS2feDVsVZ1EWASKF9DLUCgEFqfgrNaHzp3q4oEgjTsF

Page 10: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

402695

245397

23

555009

33189

203087

2

4119

52239

1045

2075

3997

1322

436

6460

8129

3794

4182

5082

5600

28006

11375

113

18674

118

2168

766

371

180

57

97

348

2909

988

38

5752

16

18

25332

191

10127

26257

23828

293

3013128

187

701

20561

59

3571

3726

928

9264

1228

77

2015

100

16614

11624

5493

2685

743443

27652

177159

35337

1538

23954

73231

9745

853524

571417

6174

33

1058

4131

548

2599

7209

594986

3117

15918

309814

13833

4673

8771

18503

74

39503

5200

51289

361

55397

41575

60009

75371

1554

43630

68209

740978

3700

1143

143481

905493

72320

786568

702604

41101

19607

1690

17054

36511

2208

24737

12453

5286

100008

3761

50866

348702

12986

15548

415933

1728

1995

6774

10273

19357

16247

11470

6257

1572

5329

1256

12009

24811

880364

547054

268413

641264

587506

12533

35578

471291

9471

7552

67844

75526

59144

13579

452366

51989

282

802

22820

817448

77098

69420

417069

863022

7716

5425

5432

2363

82748

394

9028

12101

2551

4938

23371

86348

333

14477

339

1081

233

50527

12640

1889

2944

10595

24423

70507

59244

14189

32622

17264

788201

2943

8064

4075

56871

7562

398

31121

27538

34883

8981

18733

17135

776357

165789

537840

13230

57775

10160

2483

16310

6135

2488

4537

681914

832955

31213

77642

75205

1991

3528

7627

29953

240079

5072

1492

55775

137296

3043

61017

6631

574441

352235

10870

5165

13809

4084

4597

9207

38905

428028

142916

9349

10058

726209

3740

3696

400076

34350

62669

780943

125

659519

526481

5519

205087

12727

47204

21551

108547

675844

776197

764486

4107

12549

3975

85262

18582

16030

603815

5034

4523

46878

6199

804930

2889

25810

2136

28762

3679

344045

14960

13558

1656

8

1665

10370

3588

19600

9377

24746

50987

143916

2733

27698

2107

4170

228687

5969

1235

11478

196708

31194

6497

28007

207914

19346

6650

42668

157225

5420

48715

10213

165900

157711

53821

922346

46254

12617

321

6327

33750

13035

2102

446786

24371

2226

1743

9900

1548

32339

18302

5581

41235

113055

924600

28427

11862

16987

1502

1759

43490

70552319956

477919

2084

32264

22302

815

1460

12218

1088

836

77384

3435

20718

3832

548985

3419

3444

21407

6545

31510

30887

4958

56423

7761

1510

1953

30132

28658

56

29093

33064

80780

17882

74622

829777

19867

8258

2449

45764

4409

11207

20856

76869

610616

4921

40073

774465

26114

17087

243840

3140

3708

86286

53519

32332

422555

77217669479

681136

630

146630

6689

2634

151506

21722

154342

375783

18797

137484

20818

21344

1584

631012

14693

8431

864081

817525

7246

92118

17851

17062

37501

46902

768578

574254

17891

13408

56512

38107

7426

578

22903

76147

188966

3338

33091

470041

52781

78527

49744

54742

7943

5047

232714

370000

8480

629510

859825

11959

175

7005

2093

49264

16983

446018

39775

21980

69945

3766

592542

228328

384648

562840

698225

149333

28067

11017

48235

817852

590054

586474

449841

12923

492807

5303

2660

666098

27157

845950

11173

70414

23728

457294

9943

18193

192813

171083

2687

20424

5727

786604

57918

46845

22355

651901

584122

38383

4028

14454

19345

868

155315

323306

27771

9532

155766

45080

19506

30322

674430

7253

71711

139329

4595

18547

739471

170

50887

70146

768003

10756

598

3598

430

541

67124

7738

5182

63

1096

762953

19026

7771

52831

58987

35439

128121

2173

27783

7020

1678

15504

19089

40595

346775

507548

3751

43180

23735

256630

171732

48846

34522

6877

642786

9955

7911

37611

236

69842

39616

304372

3326

30975

9232

233357

1295

10001

1523

18610

12074

69931

9006

71984

750387

67390

46911

20293

2887

31567

1360

41810

143191

69976

387932

18782

196240

14178

827

22673

15721

24466

33134

28015

109927

36344

55693

561039

11162

24988

105892

425

527277

6062

7240

23994

722335

446

452

202181

67440

10184

523721

7628

2125

4856

8591

384677

7138

71658

48807

254

853496

130134

27610

248674

686109

612382

36622229797

358509

6179

82659

8259

12328

841772

290141

868587

442417

587831

676082

613083

8793

40645

63689

37072

157632

875

710406

17869

820308

42590

167996

395993

723221

8254

55363

368709

829510

59464

120116

630964

24588

131147

556204

569420

452685

418811

741458

791786

205502

69718

2590

16475

121530

561246

12147

23541

475394

645820

254148

43653

925220

318823

209796

109004

468747

536

1160

781698

414

72935

417908

79989

65658

41660

22659

689248

761160

767468

481420

68291

299374

484070

102555

22685

80030

63173

256161

708771

135234

676007

128145

899584

340831

395706

515860

74273

451521

53006

2534

16657

592782

394363

60584

41796

82632

23890

37976

261107

4995

678712

75867

238924

559938

27559

409063

664683

878192

658120

108723

261771

383489

70345

76350

33822

333532

603326149616

204044

94403

167488

252104

755915

10445

83777

538833

20691

137429

65752

8411

5447

322962

30942

534854

512231

4558

480534

39144

238288

18675

163380

879725

161605

84891

751888

494527

123585

233746

86326

178273

307480

218923

760090

276465

407842

780579

321831

451565

751914

84285

192830

8511

22326

16705

236022

471370

39204

103835

70371

231763

31060

141030148943

14393

234138

268644

506433

498794

261350

172499

7913

811394

4572

20867

444808

804926

315792

477588

383383

457114

506869

712312

188833

66689

901540

526416

62982

498092

2462

147238

915894

355805

45504

18885

7075

551329

305610

605709

8655

258790

164308

738141

291286

7418

430560

483809

760741

492

62341

94703

561648

262641

846328

80380

118187

687770

6670

777175

893455

841816

304899

2580

2137

556969

397849

291354

91

510504

31280

55078

539191

55868

109119

852549

521950

880622

856663

64092

922367

800173

340248

78433

140070

344677

43648

78470

823945

21135

84624

914787

699025

23186

619159

317268

45168

25266

277156

64168

578218

374899

398200

64195

187077

838433

572556

255028

759289

626404

249006

82860

708728

37591

199462

416489

509275

418997

60281

353011

101109

541433

436986

525056

18026

389248

264079

2360

275210

11781

2827

396050

559161

281372

396062

117924

148255

6544

105249

659897

567089

803625

879402

62750

402224

274568

525573

656766

35634

54208

220638

537398

696224

80700

468141

826628

582469

170833

262995

341239

804482 764433

569283

411402

803672

109402

622735

29544

566122

859873

46675

31603418679

259428

168826

793468

390571

421555

80769

519042

345119

14010

37947

879501

537487

65771

299933

3425

71493

50081

42653

562083

48109

678831

211991

148404

801717

744806

347075

461765

144948

373718

711640

214751

259161

831565

548165

353103

615388

78820

11243

368699

73519

889340

5972

73280

208725

763538

31749

71509

3083

779284

818988

58394

105502

27683

842115849041

450738

144433

578685

304183

665008

62525

614845

70725

41372

60486

881736

812604

709271

760913

7709

29783

521432

302607

296006

105566

134178

569891

910436

268927

41023

384126

504961

545926

715916

158637

21647

103573

109726

628036

703656

195864

5289

722097

175966

156852

601276

455860

62668

14308

732371

77025

449763

847486

271911

75202

5478

438824

656629

605437

695560

187646

11520

480516

106294

56583

462089

851210

503562

180099

189998

222489

306232

757630

275741

498349

376099

689446

66860

7731

34103

462070

2955

18998

736587

591182

15702

377424

783865

560477

494737

105851

886141

339390

825237

3463

81770

583048

907385

790272

253335

43079

447903

529761

540021

5550

56751

42912

40387

185808

707855

136043

705443

708056

164606

18190

786395

250942

437756

1537

51798

1542

691734

2203912253

339487

43611

280329

785957

474664

530354

3632

22067

82433

54538

587348

153174

69436

77400

136796

608529

632424

460961

151153

27241

706169

722197

4073

601719

439993

91776

126596

140949

568982

165294

82543

737904

842556

159394

28326

573383

120492

32442

751903

522013

192191

52935

920699

145701

326563

79592

624049

759281

243498

871

6443

12036

771030

134817

787415

463491

618492

42783

55981

14937

42787

83752

81713

12109

675668

321365

242907

696152

818683

399204

42854

52369

22430

63122

889541

44910

114632

1913

122760

821117

649790

340634

491397

535873

744283

8855

487316

542069

36761

526235

383979

233429

776108

407469

795227

220492

782283

262093

761821

358383

712012

8251

752129

669581

265961

444415

Fig. 10. A visualisation of all users identified in Sect. V-A and all shortestpaths between the vertices representing those users and the vertex representingthe MyBitcoin service in the user network.

Fig. 11. An egocentric visualization of the thief in the imperfect user network.For this visualization, vertices are identified through their colors in the text,edges are colored according to the color of their sources and the size of eachvertex is proportional to its edge-betweeness within the egocentric network.

to LulzSec. This donation, the day before the theft, is his lastknown activity using these public-keys.

D. Flow and Temporal Analyses

In addition to visualizing egocentric networks with a fixedradius, we can follow significant flows of value through thenetwork over time. If a vertex representing a user receives alarge volume of Bitcoins relative to their estimated balance,and, shortly after, transfers a significant proportion of thoseBitcoins to another user, we deem this interesting. We built a

1 BTC17:34:04 13/06/2011

25000 BTC17:52:23 13/06/2011

0.31337 BTC17:45:31 13/06/2011

0.120607 BTC16:55:19 12/06/2011

0.11 BTC04:04:14 22/05/2011

0.09 BTC09:07:59 23/05/2011

60 transactions involving 441.83 BTC over a 70-day period

Thief

Victim

Time

Bitc

oins

Fig. 12. An interesting sub-network induced by the thief, the victim andthree other vertices. The notation is the same as in Fig. 11.

special purpose tool that, starting with a chosen vertex or setof vertices, traces significant flows of Bitcoins over time. Inpractice we have found this tool to be quite revealing whenanalyzing the user network.

Case Study – Part II: To demonstrate this tool we re-consider the Bitcoin theft described earlier. We note that thevictim has developed their own tool to generate an exhaustivelist of public-keys that have received some portion of the stolenBitcoins since the theft22. However, this list grows very quicklyand, at the time of writing, contained more than 34 100 public-keys. Figure 13 shows an annotated visualization producedusing our tool. We observe several interesting flows in theaftermath of the theft. The initial theft of a small volume of1 BTC is immediately followed by the theft of 25 000 BTC.This is represented as a dotted black line between the relevantvertices, magnified in the left inset.

In the left inset, we can see that the Bitcoins are shuffledbetween a small number of accounts and then transferredback to the initial account. After this shuffling step, we haveidentified four significant outflows of Bitcoins that began at19:49, 20:01, 20:13 and 20:55. Of particular interest are theoutflows that began at 20:55 (labeled as ‘1’ in both insets)and 20:13 (labeled as ‘2’ in both insets). These outflows passthrough several subsequent accounts over a period of severalhours. Flow 1 splits at the vertex labeled A in the right inset

22http://folk.uio.no/vegardno/allinvain-addresses.txt

Page 11: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

1555008

23

38

35306

33302

51840

15785

46868

766

5174

22345

2168

83677

26206

14343

652298

112654

570264

185593

173148

843876

579825

694287

578938

810331

559124

83545

705069

88134

97665

289393

547864

728964

786485

40371

453627

785946

908544

109875

75322

211570

119835

432124

778645

658824

686108

718222

731679

415163

418447

194930

855203

417317

628267

792796

53054

594130

812614

835745

329167

301616

10289

572864

586163

721972

384811

285412

144498

314155

640488

882742

310331

416828

341813

422259

651852

275022

675125

211899

400979

541199

912716

903287

312557

2003

818275

339830

141809

19441

675457

146889

840296

761325

685007

730355

142970

30331

733417

598910

220492

290553

736402

245910

227589

175023

586588

393368

628889

647334

71852

764650

104741

358574

260814

750257

316873

907453

164555

576725

304592

22746

657716

892158

556457

312442

443268

875756

368532

560596

278942

132412

187689

799054119

891285

199029

612697

263646

715668

861610

3956

507270

603123

32240

112654

570264

185593

843876

586163

18:52

19:21

19:24

19:24

19:38

20:01 19:21

19:59

19:49

19:49

19:59

20:55

20:13

2

A

B

C3

4

694287

578938

810331

83545

705069

88134

785946

908544

109875

75322

211570

686108

341813

422259

312557

358574

560596

891285

A

B

C

31

4

4

06/13 20:55

06/13 22:15

06/14 04:05

06/14 04:05

06/14 04:05

06/14 04:08

06/16 03:42

06/16 13:37

06/16 13:44

06/16 13:44

06/16 13:44

06/16 13:44

06/16 13:37 06/16

03:42

2

12

Y

MyBitcoin

TheftReporter

AllegedThief

AllegedThief

TheftReporter

Fig. 13. Visualisation of Bitcoin flow from the alleged theft. The left inset shows the initial shuffling of Bitcoins among accounts close to that of the allegedthief, during which all transfers happen within a few hours of the incident. The right inset shows detail on the events of several subsequent days, whereBitcoin flows split, and then later merge back into each other, validating that the flows found by the tool are probably still controlled by a single party.

at 04:05 the day after the theft. Some of its Bitcoins rejoinFlow 2 at the vertex labeled B. This new combined flow islabeled as ‘3’ in the right inset. The remaining Bitcoins fromFlow 1 pass through several additional vertices in the next twodays. This flow is labeled as ‘4’ in the right inset.

A surprising event occurs on 16/06/2011 at approximately13:37. A small number of Bitcoins are transferred from Flow 3to a heretofore unseen public-key pk1.23 Approximately sevenminutes later, a small number of Bitcoins are transferred fromFlow 3 to another heretofore unseen public-key pk2.24 Finally,there are two simultaneous transfers from Flow 4 to twomore heretofore unseen public-keys: pk325 and pk4.26 We havedetermined that these four public-keys, pk1, pk2, pk3 and pk4– which receive Bitcoins from two separate flows that splitfrom each other two days previously – are all contracted tothe same user in our ancillary network. This user is representedas C in Fig. 13.

There are several other examples of interesting flow. Theflow labeled as Y involves the movement of Bitcoins throughthirty unique public-keys in a very short period of time. Ateach step, a small number of Bitcoins (typically 30 BTCwhich had a market value of approximately US$500 at thetime of the transactions) are siphoned off. The public-keys thatreceive the small number of Bitcoins are typically representedby small blue vertices due to their low volume and degree.On 20/06/2011 at 12:35, each of these public-keys makesa transfer to a public-key operated by the the MyBitcoin

231FKFiCYJSFqxT3zkZntHjfU47SvAzauZXN241FhYawPhWDvkZCJVBrDfQoo2qC3EuKtb94251MJZZmmSrQZ9NzeQt3hYP76oFC5dWAf2nD2612dJo17jcR78Uk1Ak5wfgyXtciU62MzcEc

service.27 Curiously, this public-key was previously involvedin another separate Bitcoin theft.28.

555008

23

38

35306

33302

51840

15785

46868

766

5174

22345

2168

83677

26206

14343

652298

112654

570264

185593

173148

843876

579825

694287

578938

810331

559124

83545

705069

88134

97665

728964

40371

908544

109875

75322

211570

119835

432124

778645

658824

686108

718222

731679

415163

418447

194930

855203

417317

53054

594130

329167

301616

10289

572864

586163

721972

384811

285412

314155

882742

310331

416828

341813

422259

651852

400979

903287

2003

19441

675457

146889

730355

142970

30331

733417

598910

220492

290553

736402

227589

393368

628889

71852

260814

750257

907453

164555

576725

304592

22746

657716

556457

312442

875756

368532

560596

278942

132412

187689

799054119

891285

715668

861610

3956

507270

603123

32240

Fig. 14. The Bitcoins are transferred between public-keys along thehighlighted paths very quickly.

We also observe that the Bitcoins in many of the aboveflows are transferred between public-keys very quickly. Fig. 14shows two flows in particular where the intermediate parties

271MAazCWMydsQB5ynYXqSGQDjNQMN3HFmEu28http://forum.bitcoin.org/index.php?topic=20427.0

Page 12: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

waited for very few confirmations before re-sending the Bit-coins to other public-keys.

Naturally, much of this analysis is circumstantial. We cannotsay for certain whether or not these flows imply a sharedagency in both incidents. There is always the possibility ofdrawing false inferences. However, it does illustrate the powerof our tool when tracing the flow of Bitcoins and generatinghypotheses. It also suggests that a centralized service mayhave further details on the user(s) in control of the implicatedpublic-keys.

The same tool that we use here to examine outflows ofBitcoins from users of interest can also be used to examineinflows of Bitcoins to users and organizations accepting Bit-coins as payment or as donations.

E. Other Forms of Analysis

There are many other forms analysis that can be applied inorder to de-anonymize the workings of the Bitcoin system:

• Order books for Bitcoin exchanges are typically madeavailable to support trading tools. As orders are oftenplaced in Bitcoin values converted from other currencies,they often have a precise decimal value with eight sig-nificant digits. It may be possible to find transactionswith corresponding amounts and thus map public-keysand transactions to the exchanges.

• It is conceivable that over an extended time period,several public-keys, if used at particular times, maybelong to the same user. We could construct and cluster aco-occurrence network to help deduce further mappingsbetween public-keys and users.

• Finally, there are far more sophisticated forms of attackwhere the attacker actively participates in the network,for example, using marked Bitcoins or by operating alaundry service.

VI. CONCLUSIONS

For the past half-century futurists have heralded the adventof a cash-less society [24]. Many of their predictions have beenrealized, e.g. Anderson et al.’s [24]’s ‘on-line real-time’ pay-ment system and bank-maintained ‘central information files’.However, cash is still a competitive and relatively anonymousmeans of payment. Bitcoin is an electronic analog of cashin the online world. It is decentralized: there is no centralauthority responsible for the issuance of Bitcoins and there isno need to involve a trusted third-party when making onlinetransfers. However, this flexibility comes at a price: the entirehistory of Bitcoin transactions is publicly available. In thispaper we investigated the structure of two networks derivedfrom this dataset and their implications for user anonymity.

Using an appropriate network representation, it is possibleto map many users to public-keys. This is performed usinga passive analysis only. Active analyses, where an interestedparty can potentially deploy marked Bitcoins and collaboratingusers can discover even more information. We also believe thatlarge centralized services such as the exchanges and wallet

services are capable of identifying considerable portions ofuser activity.

Technical members of the Bitcoin community have cau-tioned that strong anonymity is not a prominent design goalof the Bitcoin system. However, casual users need to beaware of this, especially when sending Bitcoins to users andorganizations they would prefer not to be publicly associatedwith.

VII. ACKNOWLEDGEMENTS

This research was supported by Science Foundation Ireland(SFI) Grant No. 08/SRC/I1407: Clique: Graph and NetworkAnalysis Cluster. Both authors contributed equally to thiswork. It was performed independently of any industrial part-nership or collaboration of the Clique Cluster.

REFERENCES

[1] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System,” 2008.[Online]. Available: http://bitcoin.org/bitcoin.pdf

[2] C. Dwork and M. Naor, “Pricing via Processing or Combatting JunkMail,” in Proceedings of the 12th Annual International CryptologyConference on Advances in Cryptology (CRYPTO’92). Springer, 1992,pp. 139–147.

[3] A. Back, “Hashcash – A Denial of Service Counter-Measure,” 2002.[Online]. Available: http://www.hashcash.org/papers/hashcash.pdf

[4] The Economist, “Digital Curriencies – Bits and Bob,” June 2011.[Online]. Available: http://www.economist.com/node/18836780

[5] H. Cavusoglu, H. Cavusoglu, and S. Raghunathan, “Emerging Issues inResponsible Vulnerability Disclosure,” in Proceedings of the 4th AnnualWorkshop on Economics of Information Security (WEIS’05), 2005.

[6] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis,“Tastes, Ties, and Time: A New Social Network Dataset using Face-book.com,” Social Networks, vol. 30, pp. 330–342, 2008.

[7] W. Dai, “B-Money Proposal,” 1998. [Online]. Available: http://www.weidai.com/bmoney.txt

[8] R. Fugger, “Money as IOUs in Social Trust NetworksA Proposal for a Decentralized Currency Network Protocol,” 2004. [On-line]. Available: http://www.ripple-project.org/decentralizedcurrency.pdf

[9] K. Saito, “i-WAT: The Internet WAT System – An Architecture forMaintaining Trust and Facilitating Peer-to-Peer Barter Relationships,”Ph.D. dissertation, Keio University, 2006.

[10] V. Vishnumurthy, S. Chandrakumar, and E. Sirer, “KARMA: ASecure Economic Framework for Peer-to-Peer Resource Sharing,”in Proceedings of the 1st Workshop on Economics of Peer-to-PeerSystems. [Online]. Available: http://www2.sims.berkeley.edu/research/conferences/p2pecon/index.html

[11] B. Yang and H. Garcia-Molin, “PPay: Micropayments for Peer-to-PeerSystems,” in Proceedings of the 10th ACM Conference on Computer andCommunication Security (CCS’03), V. Atluri and P. Liu, Eds. ACMPress, 2003, pp. 300–310.

[12] F. Stalder, “Failures and Successes: Notes on the Development ofElectronic Cash,” The Information Society (TIS), vol. 18, no. 3, pp.209–219, 2002.

[13] N. Kichiji and M. Nishibe, “Network Analyses of the Circulation Flowof Community Currency,” Evolutionary and Institutional EconomicsReview, vol. 4, no. 2, pp. 267–300, 2008.

[14] M. Nishibe, “Chiiki Tuka No Susume (in Japanese),” Hokkaido Shok-oukai Rengou, 2004.

[15] “Where’s George?” http://www.wheresgeorge.com.[16] D. Brockmann, L. Hufnagel, and T. Geisel, “The Scaling Laws of

Human Travel,” Nature, vol. 439, no. 26, pp. 462–465, 2006.[17] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou

r3579x?: Anonymized Social Networks, Hidden Patterns, and StructuralSteganography,” in Proceedings of the 16th International Conference onWorld Wide Web. ACM, 2007, pp. 181–190.

[18] D. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, andJ. Kleinberg, “Inferring Social Ties from Geographic Coincidences,”Proceedings of the National Academy of Sciences, vol. 107, no. 52, p.22436, 2010.

Page 13: An Analysis of Anonymity in the Bitcoin System - arXiv · 2018-10-31 · An Analysis of Anonymity in the Bitcoin System Fergal Reid Clique Research Cluster University College Dublin,

[19] A. Narayanan and V. Shmatikov, “Robust De-anonymization of LargeSparse Datasets,” in Proceedings of the 29th Symposium on Security andPrivacy. IEEE, 2008, pp. 111–125.

[20] R. Puzis, D. Yagil, Y. Elovici, and D. Braha, “Collaborative Attack onInternet Users’ Anonymity,” Internet Research, vol. 19, no. 1, pp. 60–77,2009.

[21] A. Clauset, C. Shalizi, and M. Newman, “Power-Law Distributions inEmpirical Data,” SIAM Review, vol. 51, no. 4, pp. 661–703, 2009.

[22] H. Simon, “On a Class of Skew Distribution Functions,” Biometrika,vol. 42, pp. 425–440, 1955.

[23] A. Barabasi and R. Albert, “Emergence of Scaling in Random Net-works,” Science, vol. 286, no. 5439, pp. 509–512, 1999.

[24] A. Anderson, D. Cannell, T. Gibbons, G. Grote, J. Henn, J. Kennedy,M. Muir, N. Potter, and R. Whitby, An Electronic Cash and CreditSystem. American Management Association, 1966.