an analysis of anonymity in the bitcoin system - arxiv · 2018-10-31 · an analysis of anonymity...
TRANSCRIPT
An Analysis of Anonymity in the Bitcoin SystemFergal Reid
Clique Research ClusterUniversity College Dublin, Ireland
Martin HarriganClique Research Cluster
University College Dublin, [email protected]
Abstract—Anonymity in Bitcoin, a peer-to-peer electroniccurrency system, is a complicated issue. Within the system,users are identified by public-keys only. An attacker wishingto de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associateinformation external to the system with the users. Bitcoinfrustrates this attack by storing the mapping of a user to hisor her public-keys on that user’s node only and by allowingeach user to generate as many public-keys as required. In thispaper we consider the topological structure of two networksderived from Bitcoin’s public transaction history. We showthat the two networks have a non-trivial topological structure,provide complementary views of the Bitcoin system and haveimplications for anonymity. We combine these structures withexternal information and techniques such as context discoveryand flow analysis to investigate an alleged theft of Bitcoins, which,at the time of the theft, had a market value of approximatelyhalf a million U.S. dollars.
I. INTRODUCTION
Bitcoin is a peer-to-peer electronic currency system firstdescribed in a paper by Satoshi Nakamoto (probably apseudonym) in 2008 [1]. It relies on digital signatures toprove ownership and a public history of transactions to preventdouble-spending. The history of transactions is shared using apeer-to-peer network and is agreed upon using a proof-of-worksystem [2], [3].
The first Bitcoins were transacted in January 2009 andby June 2011 there were 6.5 million Bitcoins in circulationamong an estimated 10,000 users [4]. In recent months, thecurrency has seen rapid growth in both media attention andmarket price relative to existing currencies. It has featuredin both The Economist and Forbes magazine. At its peak, asingle Bitcoin traded for more than US$30 on popular Bitcoinexchanges. At the same time, U.S. Senators and lobby groupsin Germany, such as Der Bundesverband Digitale Wirtschaft(BVWD) or the Federal Association of Digital Economy, haveraised concerns regarding the untraceability of Bitcoins andtheir potential to harm society through tax evasion, moneylaundering and illegal transactions. The implications of the de-centralized nature of Bitcoin for authorities’ ability to regulateand monitor the flow of currency is as yet unclear.
Many users adopt Bitcoin for political and philosophicalreasons, as much as pragmatic ones. While there is an under-standing amongst Bitcoin’s technical users that anonymity isnot a prominent design goal of the system, we believe that thisawareness is not shared throughout the community. For exam-ple, WikiLeaks, an international organization for anonymous
whistleblowers, recently advised its Twitter followers that itnow accepts anonymous donations via Bitcoin (see Fig. 1)and states that1:
“Bitcoin is a secure and anonymous digital currency.Bitcoins cannot be easily tracked back to you, andare a [sic] safer and faster alternative to other dona-tion methods.”
They proceed to describe a more secure method of donatingBitcoins that involves the generation of a one-time public-keybut the implications for those who donate using the tweetedpublic-key are unclear. Is it possible to associate a donationwith other Bitcoin transactions performed by the same user orperhaps identify them using external information? At present,there is little detailed work on Bitcoin anonymity in the publicdomain – the extent to which this anonymity holds in the faceof determined analysis remains to be tested.
Fig. 1. Screen capture of a tweet from WikiLeaks announcing theiracceptance of ‘anonymous Bitcoin donations’.
This paper is organized as follows. In Sect. II we con-sider some existing work relating to electronic currencies andanonymity. The economic aspects of the system, interestingin their own right, are beyond the scope of this work. InSect. III we present an overview of the Bitcoin system; wefocus on three features that are particularly relevant to ouranalysis. In Sect. IV we construct two network structures, thetransaction network and the user network using the publiclyavailable transaction history. We study the static and dynamicproperties of these networks. In Sect. V we consider theimplications of these network structures for anonymity. Wealso combine information external to the Bitcoin system withtechniques such as flow and temporal analysis to illustrate howvarious types of information leakage can contribute to the de-anonymization of the system’s users. Finally, we conclude inSect. VI.
1http://wikileaks.org/support.html – Retrieved: 22-07-2011
arX
iv:1
107.
4524
v1 [
phys
ics.
soc-
ph]
22
Jul 2
011
A. A Note Regarding Motivation and Disclosure
Our motivation for this analysis is not to de-anonymize indi-vidual users of the Bitcoin system. Rather, it is to demonstrate,using a passive analysis of a publicly available dataset, theinherent limits of anonymity when using Bitcoin. This willensure that users do not have expectations that are not beingfulfilled by the system.
In security-related research, there is considerable tensionover how best to disclose vulnerabilities [5]. Many researchersfavor full disclosure where all information regarding a vulner-ability is promptly released. This enables informed users topromptly take defensive measures. Other researchers favor lim-ited disclosure; while this provides attackers with a window inwhich to exploit uninformed users, a mitigation strategy can beprepared and implemented before public announcement, thuslimiting damage, e.g. through a software update. Our analysisdoes not show a vulnerability of the Bitcoin system per se, butdoes illustrate some potential risks and pitfalls with regard toanonymity. However, there is no central authority which canfundamentally change the system’s behavior. Furthermore, itis not possible to mitigate analysis of the existing transactionhistory.
There are also two noteworthy features of the dataset whencompared with, say, contentious social network datasets, e.g.the Facebook profiles of Harvard University students [6].Firstly, the delineation between what is considered public andprivate is clear: the entire history of Bitcoin transactions ispublicly available. Secondly, the Bitcoin system does not havea usage policy. After joining Bitcoin’s peer-to-peer network, aclient can freely request the entire history of Bitcoin transac-tions; there is no crawling or scraping required.
Thus, we believe the best strategy to minimise the threatto user anonymity is to be descriptive about the risks of theBitcoin system. We do not identify individual users – apartfrom those in the case study – but we note that it is notdifficult for other groups to replicate our work. Indeed, giventhe passive nature of the analysis, other parties may alreadybe conducting similar analyses.
II. RELATED WORK
The related work for this paper can be categorized into twofields: electronic currencies and anonymity.
A. Electronic Currencies
Electronic currencies can be technically classified accordingto their mechanisms for establishing ownership, protectingagainst double-spending, ensuring anonymity and/or privacy,and generating and issuing new currency. Bitcoin is particu-larly noteworthy for the last of these mechanisms. The proof-of-work system [2], [3] that establishes consensus regardingthe history of transactions also doubles as a minting mech-anism. The scheme was first outlined in the B-Money Pro-posal [7]. We briefly consider some alternative mechanisms.Ripple [8] is an electronic currency where every user can issuecurrency. However, the currency is only accepted by peers whotrust the issuer. Transactions between arbitrary pairs of users
require chains of trusted intermediaries between the users.Saito [9] formalized and implemented a similar system, i-WAT,in which the the chain of intermediaries can be establishedwithout their immediate presence using digital signatures.KARMA [10] is an electronic currency where the centralauthority is distributed over a set of users that are involved inall transactions. PPay [11] is a micropayment scheme for peer-to-peer systems where the issuer of the currency is responsiblefor keeping track of it. However, both KARMA and PPay mayincur a large overhead when the rate of transactions is high.Mondex is a smart-card electronic currency [12]. It preserves acentral bank’s role in the generation and issuance of electroniccurrency. Mondex was an electronic replacement for cash inthe physical world whereas Bitcoin is an electronic analog ofcash in the online world.
The authors are not aware of any studies of the net-work structure of electronic currencies. However, there aresuch studies of physical currencies. The community currencyTomamae-cho was introduced into the Hokkaido Prefecture inJapan for a three-month period during 2004–05 in a bid torevitalize local economy. The Tomamae-cho system involvedgift-certificates that were re-usable and legally redeemable intoyen. There was an entry space on the reverse of each certificatefor recipients to record transaction dates, their names andaddresses, and the purposes of use, up to a maximum offive recipients. Kichiji and Nishibe [13] used the collectedcertificates to derive a network structure that represented theflow of currency during the period. They showed that thecumulative degree distribution of the network obeyed a power-law distribution, the network had small-world properties (theaverage clustering coefficient was high whereas the averagepath length was low), the directionality and the value oftransactions were significant features, and the double-trianglesystem [14] was effective. There also exist studies of thephysical movement of currency: ‘Where’s George?’ [15] isa crowd-sourced method for tracking U.S. dollar bills whereusers record the serial numbers of bills in their possession,along with their current location. If a bill is recorded suffi-ciently often, its geographical movement can be tracked overtime. Brockmann et al. [16] used this dataset as a proxyfor studying multi-scale human mobility and as a tool forcomputing geographic borders inherent to human mobility.
B. Anonymity
Previous work has shown the difficulty of maintaininganonymity in the context of networked data and online serviceswhich expose partial user information. Backstrom et al. [17]considered privacy attacks which identify users using the struc-ture of the network around them and discussed the difficultyof guaranteeing user anonymity in the presence of networkdata. Crandall et at. [18] infer social ties between userswhere none are explicitly stated by looking at patterns of ‘co-incidences’ or common off-network co-occurences. Narayananand Shmatikov [19] de-anonymized the Netflix Prize dataset
using information from IMDB2 which had similar user con-tent, showing that statistical matching between different butrelated datasets can be used to attack anonymity. Puzis etal. [20] simulated the monitoring of a communications net-work using strategically-located monitoring nodes and showthat, using real-world network topologies, a relatively smallnumber of nodes can collaborate to pose a significant threatto anonymity. All of this work points to the difficulty inmaintaining anonymity where network data on user behaviouris available and illustrates how seemingly minor informationleakages can be aggregated to pose significant risks.
III. THE BITCOIN SYSTEM
The following is a simplified description of the Bitcoinsystem; see Nakamoto [1] for a more thorough treatment.Bitcoin is an electronic currency with no central authority orissuer. There is no central bank or fractional reserve systemcontrolling the supply of Bitcoins. Instead, they are generatedat a predictable rate such that the eventual total number willbe 21 million. There is no requirement for a trusted third-party when making transactions. Suppose Alice wishes to‘send’ a number of Bitcoins to Bob. Alice uses a Bitcoinclient to join the Bitcoin peer-to-peer network and makesa public transaction or declaration stating that one or moreidentities that she controls (which can be verified using public-key cryptography), and which previously had a number ofBitcoins assigned to them, wish to re-assign those Bitcoins toone or more other identities, at least one of which is controlledby Bob. The participants of the peer-to-peer network form acollective consensus regarding the validity of this transactionby appending it to the public history of previously agreed-upontransactions (the longest block-chain). This process, known asmining, involves the repeated computation of a cryptographichash function so that the digest of the transaction, alongwith other pending transactions, and an arbitrary nonce, has aspecific form. This process is designed to require considerablecomputational effort, from which the security of the Bitcoinmechanism is derived. To encourage users to pay this compu-tational cost, the process is incentivized using newly generatedBitcoins and/or transaction fees.
In this paper, there are three features of the Bitcoin systemthat are of particular interest. Firstly, the entire history ofBitcoin transactions is publicly available. This is necessary inorder to validate transactions and prevent double-spending inthe absence of a central authority. The only way to confirm theabsence of a previous transaction is to be aware of all previoustransactions. The second feature of interest is that a transactioncan have multiple inputs and multiple outputs. An input to atransaction is either the output of a previous transaction ora sum of newly generated Bitcoins and transaction fees. Atransaction frequently has either a single input from a previouslarger transaction or multiple inputs from previous smallertransactions. Also, a transaction frequently has two outputs:one sending payment and one returning change. Thirdly, the
2http://www.imdb.com
payer and payee(s) of a transaction are identified throughpublic-keys from public-private key-pairs. However, a usercan have multiple public-keys. In fact, it is considered goodpractice for a payee to generate a new public-private key-pair for every transaction. Furthermore, a user can take thefollowing steps to better protect their identity: they can avoidrevealing any identifying information in connection with theirpublic-keys; they can repeatedly send varying fractions oftheir Bitcoins to themselves using multiple (newly generated)public-keys; and/or they can use a trusted third-party mixer orlaundry. However, these practices are not universally applied.
The three features above, namely the public availabilityof Bitcoin transactions, the input-output relationship betweentransactions and the re-use and co-use of public-keys, providea basis for two distinct network structures: the transactionnetwork and the user network. The transaction network rep-resents the flow of Bitcoins between transactions over time.Each vertex represents a transaction and each directed edgebetween a source and a target represents an output of thetransaction corresponding to the source that is an input to thetransaction corresponding to the target. Each directed edgealso includes a value in Bitcoins and a timestamp. The usernetwork represents the flow of Bitcoins between users overtime. Each vertex represents a user and each directed edgebetween a source and a target represents an input-output pair ofa single transaction where the input’s public-key belongs to theuser corresponding to the source and the output’s public-keybelongs to the user corresponding to the target. Each directededge also includes a value in Bitcoins and a timestamp.
We gathered the entire history of Bitcoin transactions fromthe first transaction on the 3rd January 2009 up to andincluding the last transaction that occurred on the 12th July2011. We gathered the dataset using the Bitcoin client3 anda modified version of Gavin Andresen’s bitcointools.4 Thedataset comprises 1 019 486 transactions between 1 253 054unique public-keys. We describe the construction of the cor-responding transaction and user networks and their analysesin the following sections. We will show that the two networksare complex, have a non-trivial topological structure, providecomplementary views of the Bitcoin system and have impli-cations for the anonymity of users.
IV. THE TRANSACTION AND USER NETWORKS
A. The Transaction Network
The transaction network T represents the flow of Bitcoinsbetween transactions over time. Each vertex represents atransaction and each directed edge between a source and atarget represents an output of the transaction corresponding tothe source that is an input to the transaction corresponding tothe target. Each directed edge also includes a value in Bitcoinsand a timestamp. It is a straight-forward task to construct Tfrom our dataset.
3http://www.bitcoin.org4http://github.com/gavinandresen/bitcointools
1.2 BTC
01/05/2011 14:13:26
... t4 has 12 other
inputs not shown here
1.32 BTC14:10:54 05/05/2011
0.12 BTC13:12:19 05/05/2011
t1
t2
t3 t4
Fig. 2. An example sub-network from the transaction network. Eachrectangular vertex represents a transaction and each directed edge representsa flow of Bitcoins from an output of one transaction to an input of another.
Figure 2 shows an example sub-network of T . t1 is atransaction with one input and two outputs.5 It was addedto the block-chain on the 1st May 2011. One of its outputsassigned 1.2 BTC (Bitcoins) to a user identified by the public-key pk1.6 The public-keys are not shown in Fig. 2. Similarly,t2 is a transaction with two inputs and two outputs.7 It wasaccepted on the 5th May 2011. One of its outputs sent 0.12BTC to a user identified by a different public-key, pk2.8 t3is a transaction with two inputs and and one output.9 It wasaccepted on the 5th May 2011. Both of its inputs are connectedto the two aforementioned outputs of t1 and t2. The onlyoutput of t3 was redeemed by t4.10
T has 974 520 vertices and 1 558 854 directed edges. Thenumber of vertices is less than the total number of transactionsin the dataset because we omit transactions that are notconnected to at least one other transaction. These correspondto newly generated Bitcoins and transactions fees that are notyet redeemed. The network has neither multi-edges (multipleedges between the same pair of vertices in the same direction)nor loops. It is a directed acyclic graph (DAG) since theoutput of a transaction can never be an input (either directlyor indirectly) to the same transaction.
Figure 3(a) shows a log-log plot of the cumulative degreedistributions: the solid red curve is the cumulative degreedistribution (in- and out-degree); the dashed green curve is thecumulative in-degree distribution; and the dotted blue curveis the cumulative out-degree distribution. We fitted power-law distributions, p(x) ∼ x−α for x > xmin, to the threedistributions by estimating the parameters α and xmin using
5The transactions and public-keys used in our examples existin our dataset. The unique identifier for the transaction t1 is09441d3c52fa0018365fcd2949925182f6307322138773d52c201f5cc2bb5976.You can query the details of a transaction or public-key by examiningBitcoin’s longest block-chain using, say, the Bitcoin Block Explorer(http://www.blockexplorer.com).
613eBhR3oHFD5wkE4oGtrLdbdi2PvK3ijMC70c4d41d0f5d2aff14d449daa550c7d9b0eaaf35d81ee5e6e77f8948b14d62378819smBSUoRGmbH13vif1Nu17S63Tnmg7h9n90c034fb964257ecbf4eb953e2362e165dea9c1d008032bc9ece5cebbc7cd469710f16ece066f6e4cf92d9a72eb1359d8401602a23990990cb84498cdbb93026402
a goodness-of-fit method [21]. Table I shows the estimatesalong with the corresponding Kolmogorov–Smirnov goodness-of-fit (GoF) statistics and p-values. We observe that none ofthe distributions for which the empirically-best scaling regionis non-trivial have a power-law as a plausible hypothesis(p > 0.1). This is likely due to the fact that there is nopreferential attachment [22], [23]: new vertices are joined toexisting vertices whose corresponding transactions are not yetfully redeemed.
There are 1 949 (maximal weakly) connected componentsin the network. Fig. 3(b) shows a log-log plot of the cumu-lative component size distribution. There are 948 287 vertices(97.31%) in the giant component. This component also con-tains a giant biconnected component with 716 354 vertices(75.54% of the vertices in the giant component).
Variable x x s α xmin GoF p-val.Degree 3 3.20 6.20 3.24 50 0.02 0.05In-Degree 1 1.60 5.31 2.50 4 0.01 0.00Out-Degree 1 1.60 3.17 3.50 51 0.05 0.00
TABLE ITHE DEGREE, IN-DEGREE AND OUT-DEGREE DISTRIBUTIONS OF T .
We also performed a rudimentary dynamic analysis of thenetwork. Figures 3(c), 3(d) and 3(e) show the edge number,density and average path length of the transaction networkon a monthly basis. These measurements are not cumulative.The network’s growth and sparsification are evident. We alsoobserve some anomalies in the average path length during Julyand November 2010.
B. The User Network
The user network U represents the flow of Bitcoins betweenusers over time. Each vertex represents a user and eachdirected edge between a source and a target represents aninput-output pair of a single transaction where the input’spublic-key belongs to the user corresponding to the source andthe output’s public-key belongs to the user corresponding tothe target. Each directed edge also includes a value in Bitcoinsand a timestamp.
We need to perform a preprocessing step before we canconstruct U from our dataset. Suppose U is, at first, imperfectin the sense that each vertex represents a single public-keyrather than a user and that each directed edge between asource and a target represents an input-output pair of a singletransaction, where the input’s public-key corresponds to thesource and the output’s public-key corresponds to the target.In order to perfect this network, we need to contract eachsubset of vertices whose corresponding public-keys belong toa single user. The difficulty is that public-keys are Bitcoin’smechanism for ensuring anonymity: ‘the public can see thatsomeone [identified by a public-key] is sending an amount tosomeone else [identified by another public-key], but withoutinformation linking the transaction to anyone.’ [1]. In fact, it isconsidered good practice for a payee to generate a new public-private key-pair for every transaction to keep transactions from
(a) A log-log plot of the cumulative degree distributions. (b) A log-log plot of the cumulative component size distribution.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Edge Number of Transaction Network
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
(c) A temporal histogram showing the number ofedges per month.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Density of Transaction Network
0.000
0.001
0.002
0.003
0.004
0.005
(d) A temporal histogram showing the density permonth.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Average Path Length of Transaction Network
0
500
1000
1500
2000
2500
3000
(e) A temporal histogram showing the averagepath length per month.
Fig. 3. The degree distributions, component size distribution and monthly edge number, density and average path length of the transaction network.
being linked to a common owner. Therefore, it is impossibleto completely perfect the network using our dataset alone.However, as noted by Nakamoto [1],
“Some linking is still unavoidable with multi-inputtransactions, which necessarily reveal that their in-puts were owned by the same owner. The risk isthat if the owner of a key is revealed, linking couldreveal other transactions that belonged to the sameowner.”
We will use this property of transactions with multipleinputs to contract subsets of vertices in the imperfect net-work. We construct an ancillary network where each vertexrepresents a public-key and each undirected edge between twoend points represents a pair of inputs of a single transactionwhose public-keys correspond to the end points. Using ourdataset, this network has 1 253 054 vertices (unique public-keys) and 4 929 950 edges. More importantly, it has 86 641
non-trivial maximal connected components. We deduce thateach maximal connected component corresponds to a user andeach component’s constituent vertices correspond to that user’spublic-keys.
Figure 5 shows an example sub-network of the imperfectnetwork overlaid onto the example sub-network of T fromFig. 2. The outputs of t1 and t2 that were eventually redeemedby t3 were sent to a user whose public-key was pk1 and auser whose public-key was pk2 respectively. Figure 6 showsan example sub-network of the user network overlaid onto theexample sub-network of the imperfect network from Fig. 5.pk1 and pk2 are contracted into a single vertex u1 since theycorrespond to a pair inputs of a single transaction. In otherwords, they are in the same maximal connected componentof the ancillary network (see the vertices representing pk1and pk2 in the dashed grey box in Fig. 6). A single userowns both public-keys. We note that the maximal connected
(a) A log-log plot of the cumulative degree distributions. (b) A log-log plot of the cumulative component size distribution.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Edge Number of User Network
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
6e+05
7e+05
(c) A temporal histogram showing the number ofedges per month.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Density of User Network
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(d) A temporal histogram showing the density permonth.
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Average Path Length of User Network
0
5
10
15
(e) A temporal histogram showing the averagepath length per month.
Fig. 4. The degree distributions, component size distribution and monthly edge number, density and average path length of the user network.
t1
t2
t3 t4pk2
pk1
Fig. 5. An example sub-network from the imperfect network. Each diamondvertex represents a public-key and each directed edge between diamondvertices represents a flow of Bitcoins from one public-key to another.
component in this case is not simply a clique; it has a diameterof four indicating that there are at least two public-keysbelonging to that same user that are connected indirectly viathree transactions. The sixteen inputs to transaction t4 resultin the contraction of a further sixteen public-keys into a singlevertex u2. The value and timestamp of the flow of Bitcoinsfrom u1 to u2 is derived from the transaction network.
After the preprocessing step, U has 881 678 vertices (86 641non-trivial maximal connected components and 795 037 iso-lated vertices in the ancillary network) and 1 961 636 directededges. The network is still imperfect. We have not contractedall possible vertices but it will suffice for our present analysis.Unlike T , U has multi-edges, loops and directed cycles.
Figure 4(a) shows a log-log plot of the network’s cumulativedegree distributions. We fitted power-law distributions to thethree distributions and calculated their goodness-of-fit andstatistical significance as in the previous section. Table II
pk2
pk1
u1
u2
1.32 BTC
14:10:54
05/05/20
11
pk2pk1
Fig. 6. An example sub-network from the user network. Each circular vertexrepresents a user and each directed edge between circular vertices represents aflow of Bitcoins from one user to another. The maximal connected componentfrom the ancillary network that corresponds to the vertex u1 is shown withinthe dashed grey box.
shows the results. We observe that none of the distributionshave a power-law as a plausible hypothesis.
There are 604 (maximal) weakly connected componentsand 579 355 (maximal) strongly connected components in thenetwork; Fig. 4(b) shows a log-log plot of the cumulativecomponent size distribution for both variations. There are879 859 vertices (99.79%) in the giant weakly connectedcomponent. This component also contains a giant weaklybiconnected component with 652 892 vertices (74.20% of thevertices in the giant component).
Variable x x s α xmin GoF p-val.Degree 3 4.45 218.10 2.38 66 0.02 0.00In-Degree 1 2.22 86.40 2.45 57 0.05 0.00Out-Degree 2 2.22 183.91 2.03 10 0.22 0.00
TABLE IITHE DEGREE, IN-DEGREE AND OUT-DEGREE DISTRIBUTIONS OF U .
Our dynamic analysis of the user network mirrors that of thetransaction network in the previous subsection. Figures 4(c),4(d) and 4(e) show the edge number, density and averagepath length of the user network on a monthly basis. Thesemeasurements are not cumulative. The network’s growth andsparsification are evident. We note that even though ourdynamic analysis of the user network is on a monthly basis, thepreprocessing step is performed using the ancillary network ofthe entire imperfect network. This enables us to resolve public-keys to a single user irrespective of the month is which thelinking tranactions occur.
V. ANONYMITY ANALYSIS
Prior to performing the analyses above, we expected the usernetwork to be largely composed of disjoint trees representingBitcoin flows between one-time public-keys that were notlinked with other public-keys. However, our analyses reveal
that the user network has considerable cyclic structure. Wenow consider the implications of this structure, coupled withother aspects of the Bitcoin system, for anonymity.
There are several ways in which the user network canbe used to deduce information about Bitcoin users. We canuse global network properties, such as degree distribution, toidentify outliers, e.g. users with unusually high activity in atime-window following an event of interest. We can use localnetwork properties to examine the context in which a useroperates by observing the egocentric network of a particularuser and the users with which he or she interacts with eitherdirectly or indirectly. The dynamic nature of the user networkalso enables us to perform flow and temporal analyses. We canexamine the significant Bitcoin flows between groups of usersover time. We will now discuss each of these threats in moredetail, and provide a case study to demonstrate their potential.
A. Integrating Off-Network Information
There is no user directory for the Bitcoin system. However,we can attempt to build a partial user directory associatingBitcoin users (and their known public-keys) with off-networkinformation. If we can make sufficient associations and com-bine them with the network structures above, a potentiallyserious threat to anonymity emerges.
Many organizations and services such as on-line stores thataccept Bitcoinis, exchanges, laundry services and mixers haveaccess to identifying information regarding their users, e.g.e-mail addresses, shipping addresses, credit card and bankaccount details, IP addresses, etc. If any of this informationwas publicly available, or accessible by, say, law enforcementagencies, then the identities of users involved in related trans-actions may also be at risk. To illustrate this point, we considera number of publicly available data sources and integrate theirinformation with the user network.
1) The Bitcoin Faucet: The Bitcoin Faucet11 is a websitewhere users can donate Bitcoins to be redistribtued in smallamounts to other users. In order to prevent abuse of thisservice, a history of recent give-aways are published alongwith the IP addresses of the recipients. When the BitcoinFaucet does not batch the re-distribution, it is possible toassociate the IP addresses with the recipient’s public-keys.This page can be scraped over time to produce a time-stampedmapping of IP addresses to users.
We found that the public-keys associated with many ofthe IP addresses that received Bitcoins were contracted withother public-keys in the ancillary network, thus revealing IPaddresses that are somehow related to previous transactions.Fig. 7(a) shows a map of geolocated IP addresses belongingto users who received Bitcoins over a period of one week.Fig. 7(b) overlays the user network onto a sample of thoseusers. An edge between two geolocated IP addresses indicatesthat the corresponding users are linked by an undirected pathof length at most three in the user network; the path must notcontain the vertex representing the Bitcoin Faucet itself.
11http://freebitcoins.appspot.com
These figures serve as a proof-of-concept from a smallpublicly available data source. We note that large centralizedBitcoin service providers are capable of producing much moredetailed maps.
2) Voluntary Disclosures: Another source of identifyinginformation is the voluntary disclosure of public-keys by users,for example, when posting to the Bitcoin forums12. Bitcoinpublic-keys are typically represented as strings approximatelythirty-three characters in length and starting with the digit one.They are indexed very well by popular search engines. Weidentified many high-degree vertices with external informationusing a search engine alone. We proceeded to scrape theBitcoin Forums where users frequently attach a public-key totheir signatures. We also gathered public-keys from Twitterstreams and user-generated public directories. It is importantto note that in many cases we are able to resolve the ‘public’public-keys with other public-keys belonging to the same userusing the ancillary network. We also note that large centralizedBitcoin service providers can do the same with their userinformation.
B. Egocentric Analysis and Visualization of the User Network
There are severals pieces of information we can directlyderive from the user network regarding a particular user. Wecan compute the balance held by a single public-key. Wecan also aggregate the balances belonging to public-keys thatare controlled by a particular user. For example, Fig. 8(a)and Fig. 8(b) show the receipts and payments to and fromWikiLeaks’ public-key in terms of Bitcoin and transactionvolume respectively. The donations are generally small do-nations and are forwarded to other public-keys periodically.There was also a noticeable spike in donations when thefacility was first announced. Figure 8(c) shows the receiptsand payments to and from the creator of a popular Bitcointrading website aggregated over a number of public-keys thatare linked through the ancillary network.
An important advantage of deriving network structures fromthe Bitcoin transaction history is our ability to use networkvisualization and analysis tools to investigate the flow ofBitcoins. For example, Fig. 9 shows the network structuresurrounding the WikiLeaks’ public-key in the imperfect usernetwork. Our tools resolve several of the vertices with iden-tifying information gathered in Sect. V-A. These users canbe linked either directly or indirectly to their donations. Thepresence of a Bitcoin mining pool (a large red vertex) and anumber of public-keys between it and WikiLeaks’ public-keyis interesting.
C. Context Discovery
Given a number of public-keys or users of interest, we canuse network structure and context to better understand theflow of Bitcoins between them. For example, we can examineall shortest paths between a set of vertices or consider themaximum number of Bitcoins that can flow from a source
12http://forum.bitcoin.org
864768
9264
4096
6658
472235
80470
68102
18
411137
780297
140810
384012
587522
4623
23
751773
611795
192529
3726
334520
828650
146229
503227
18237
372
2434
928
187
880147
24634
233606
24215
301221
5821
5329
339
5992
751500
11767
508
359310
628760
684569
39855
466972
509470
693791
356385
297506
154569
267814
189486
70191
832671
459403
86175
290020
376702
283026
447905
737204
344528
921429
57908
436
145418
514111
58434
8259
12356
710582
704070
656967
146510
704591
863825
464995
52821
369238
527959
16987
343644
529502
615521
12387
13412
33893
644200
54890
710253
399470
395889
18036
688245
222134
164472
520832
132738
37825
17032
18576
826297
13458
244372
438264
529050
7708
578218
117419
438956
64686
175
18610
81093
1239
398555
216369
655670
50491
45400
32117
209391
491077
10675
406546
35359
40117
19638
18377
745503
140481
873847
494200
234195
761560
765657
8410
157403
623946
60643
889553
860392
500971
15598
643312
503026
1779
136948
564989
284930
1795
645446
184582
182487
852749
386325
594199
272431
167130
402500
253733
526638
720687
317235
518708
34621
341813
353253
675003
102022
45905
359250
305493
226134
84823638885
364379
5982
914405
171360
15718
681322
77287
622956
447856
235752
463223
448376
28025
152443
8687
214400
18821
797574
803208
59275
503694
782736
631191
43076
237981
864158
124322
534086
502183
431016
74665
14764
522159
829366
750009
660923
304061
88480
402887
215502
35281
385440
50644
364811
400863
8673
36325
56401
373736
713708
82770
890863
101679
141395
106374
614392
55802
822782
Fig. 9. An egocentric visualization of the vertex representing WikiLeaks’public-key in the imperfect user network. The size of a vertex corresponds toits degree in the entire imperfect user network. The color denotes the volumeof Bitcoins – warmer colors have larger volumes flowing through them. Thelarge red vertices represent a Bitcoin mining pool, a centralized Bitcoin walletservice and an unknown entity.
to a destination given the transactions and their ‘capacities’in an interesting time-window. For example, Fig. 10 showsall shortest paths between the vertices representing the userswe identified using off-network information in Sect. V-A andthe vertex that represents the MyBitcoin service13 in the usernetwork. We can identify more than 60% of the users in thisvisualization and deduce many direct and indirect relationshipsbetween them.
Case Study – Part I: We analyse an alleged theft of 25 000BTC reported in the Bitcoin Forums14 by a user known asallinvain. The victim reported that a large portion ofhis Bitcoins were sent to pkred
15 on 13/06/2011 at 16:52:23UTC. The theft occurred shortly after somebody broke into thevictim’s Slush pool account16 and changed the payout addressto pkblue.17. The Bitcoins rightfully belonged to pkgreen.18. Atthe time of the theft, the stolen Bitcoins had a market valueof approximately half a million U.S. dollars. We chose thiscase study to illustrate the potential risks to the anonymity ofa user (the thief) who has good reason to remain anonymous.
We consider the imperfect user network before any con-tractions. We restrict ourselves to the egocentric networksurrounding the thief: we include every vertex that is reachableby a path of length at most two ignoring directionality and all
13http://www.mybitcoin.com14http://forum.bitcoin.org/index.php?topic=16457.0151KPTdMb6p7H3YCwsyFqrEmKGmsHqe1Q3jg16http://http://mining.bitcoin.cz1715iUDqk6nLmav3B1xUHPQivDpfMruVsu9f181J18yk7D353z3gRVcdbS7PV5Q8h5w6oWWG
(a) A map of geolocated IP addresses associated with users receiving Bitcoinsfrom the Bitcoin Faucet during a one week period.
(b) A map of a sample of the geolocated IP addresses in Fig. 7(a) connectedby edges where the corresponding users are connected by a path of length atmost three in the user network that does not include the vertex representingthe Bitcoin Faucet.
Fig. 7. We can use the Bitcoin Faucet to map users to geolocated IP addresses.
0 5 10 15 20 25 30Day
0
50
100
150
200
250
300
350
400
450
Bitc
oins
Cumulative IncomingCumulative Outgoing
(a) The receipts and payments to and from WikiLeaks’public-key over time.
0 5 10 15 20 25 30Day
0
10
20
30
40
50
60
70
Tran
sact
ions
Outgoing TransactionsIncoming Transactions
(b) The number of transactions involving WikiLeaks’public-key over time.
0 50 100 150 200 250 300 350Day
0
2000
4000
6000
8000
10000
12000
14000
16000
Bitc
oins
Cumulative IncomingCumulative Outgoing
(c) The receipts and payments to and from the creatorof a popular Bitcoin trading website aggregated over anumber of public-keys.
Fig. 8. Plots of the cumulative receipts and payments to and from Bitcoin public-keys and users.
edges induced by these vertices. We also remove all loops,multiple edges and edges that are not contained in somebiconnected component to avoid clutter. In Fig. 11, the redvertex represents the thief who owns the public-key pkred andthe green vertex represents the victim who owns the public-key pkgreen. The theft is the green edge joining the victim tothe thief. There are in fact two green edges located nearby inFig. 11 but only one directly connects the victim to the thief.
Interestingly, the victim and the thief are joined by paths(ignoring directionality) other than the green edge representingthe theft. For example, consider the sub-network shown inFig. 12 induced by the red, green, purple, yellow and orangevertices. This sub-network is a cycle. We contract all verticeswhose corresponding public-keys belong to the same user. Thisallows us to attach values in Bitcoins and timestamps to thedirected edges. We can make a number of observations. Firstly,we note that the theft of 25 000 BTC was preceded by a smallertheft of 1 BTC. This was later reported by the victim in theBitcoin forums. Secondly, using off-network data, we haveidentified some of the other colored vertices: the purple vertexrepresents the main Slush pool account and the orange vertex
represents the computer hacker group known as LulzSec.19
We note that there has been at least one attempt to associatethe thief with LulzSec20. This was a fake; it was created afterthe theft. However, the identification of the orange vertex withLulzSec is genuine and was established before the theft. Weobserve that the thief sent 0.31337 BTC to LulzSec shortlyafter the theft but we cannot otherwise associate him with thegroup. The main Slush pool account sent a total of 441.83BTC to the victim over a 70-day period. It also sent a total of0.2 BTC to the yellow vertex over a two day period. One daybefore the theft, the yellow vertex also sent 0.120607 BTC toLulzSec.
The yellow vertex represents a user who is the owner of atleast five public-keys.21 Like the victim, he is a member ofthe Slush pool, and like the thief, he is a one-time donator
19http://twitter.com/LulzSec/status/7638857683265126520http://pastebin.com/88nGp508211MUpbAY7rjWxvLtUwLkARViqSdzypMgVW4
13tst9ukW294Q7f6zRJr3VmLq6zp1C68EK1DcQvXMD87MaYcFZqHzDZyH3sAv8R5hMZe1AEW9ToWWwKoLFYSsLkPqDyHeS2feDVsVZ1EWASKF9DLUCgEFqfgrNaHzp3q4oEgjTsF
402695
245397
23
555009
33189
203087
2
4119
52239
1045
2075
3997
1322
436
6460
8129
3794
4182
5082
5600
28006
11375
113
18674
118
2168
766
371
180
57
97
348
2909
988
38
5752
16
18
25332
191
10127
26257
23828
293
3013128
187
701
20561
59
3571
3726
928
9264
1228
77
2015
100
16614
11624
5493
2685
743443
27652
177159
35337
1538
23954
73231
9745
853524
571417
6174
33
1058
4131
548
2599
7209
594986
3117
15918
309814
13833
4673
8771
18503
74
39503
5200
51289
361
55397
41575
60009
75371
1554
43630
68209
740978
3700
1143
143481
905493
72320
786568
702604
41101
19607
1690
17054
36511
2208
24737
12453
5286
100008
3761
50866
348702
12986
15548
415933
1728
1995
6774
10273
19357
16247
11470
6257
1572
5329
1256
12009
24811
880364
547054
268413
641264
587506
12533
35578
471291
9471
7552
67844
75526
59144
13579
452366
51989
282
802
22820
817448
77098
69420
417069
863022
7716
5425
5432
2363
82748
394
9028
12101
2551
4938
23371
86348
333
14477
339
1081
233
50527
12640
1889
2944
10595
24423
70507
59244
14189
32622
17264
788201
2943
8064
4075
56871
7562
398
31121
27538
34883
8981
18733
17135
776357
165789
537840
13230
57775
10160
2483
16310
6135
2488
4537
681914
832955
31213
77642
75205
1991
3528
7627
29953
240079
5072
1492
55775
137296
3043
61017
6631
574441
352235
10870
5165
13809
4084
4597
9207
38905
428028
142916
9349
10058
726209
3740
3696
400076
34350
62669
780943
125
659519
526481
5519
205087
12727
47204
21551
108547
675844
776197
764486
4107
12549
3975
85262
18582
16030
603815
5034
4523
46878
6199
804930
2889
25810
2136
28762
3679
344045
14960
13558
1656
8
1665
10370
3588
19600
9377
24746
50987
143916
2733
27698
2107
4170
228687
5969
1235
11478
196708
31194
6497
28007
207914
19346
6650
42668
157225
5420
48715
10213
165900
157711
53821
922346
46254
12617
321
6327
33750
13035
2102
446786
24371
2226
1743
9900
1548
32339
18302
5581
41235
113055
924600
28427
11862
16987
1502
1759
43490
70552319956
477919
2084
32264
22302
815
1460
12218
1088
836
77384
3435
20718
3832
548985
3419
3444
21407
6545
31510
30887
4958
56423
7761
1510
1953
30132
28658
56
29093
33064
80780
17882
74622
829777
19867
8258
2449
45764
4409
11207
20856
76869
610616
4921
40073
774465
26114
17087
243840
3140
3708
86286
53519
32332
422555
77217669479
681136
630
146630
6689
2634
151506
21722
154342
375783
18797
137484
20818
21344
1584
631012
14693
8431
864081
817525
7246
92118
17851
17062
37501
46902
768578
574254
17891
13408
56512
38107
7426
578
22903
76147
188966
3338
33091
470041
52781
78527
49744
54742
7943
5047
232714
370000
8480
629510
859825
11959
175
7005
2093
49264
16983
446018
39775
21980
69945
3766
592542
228328
384648
562840
698225
149333
28067
11017
48235
817852
590054
586474
449841
12923
492807
5303
2660
666098
27157
845950
11173
70414
23728
457294
9943
18193
192813
171083
2687
20424
5727
786604
57918
46845
22355
651901
584122
38383
4028
14454
19345
868
155315
323306
27771
9532
155766
45080
19506
30322
674430
7253
71711
139329
4595
18547
739471
170
50887
70146
768003
10756
598
3598
430
541
67124
7738
5182
63
1096
762953
19026
7771
52831
58987
35439
128121
2173
27783
7020
1678
15504
19089
40595
346775
507548
3751
43180
23735
256630
171732
48846
34522
6877
642786
9955
7911
37611
236
69842
39616
304372
3326
30975
9232
233357
1295
10001
1523
18610
12074
69931
9006
71984
750387
67390
46911
20293
2887
31567
1360
41810
143191
69976
387932
18782
196240
14178
827
22673
15721
24466
33134
28015
109927
36344
55693
561039
11162
24988
105892
425
527277
6062
7240
23994
722335
446
452
202181
67440
10184
523721
7628
2125
4856
8591
384677
7138
71658
48807
254
853496
130134
27610
248674
686109
612382
36622229797
358509
6179
82659
8259
12328
841772
290141
868587
442417
587831
676082
613083
8793
40645
63689
37072
157632
875
710406
17869
820308
42590
167996
395993
723221
8254
55363
368709
829510
59464
120116
630964
24588
131147
556204
569420
452685
418811
741458
791786
205502
69718
2590
16475
121530
561246
12147
23541
475394
645820
254148
43653
925220
318823
209796
109004
468747
536
1160
781698
414
72935
417908
79989
65658
41660
22659
689248
761160
767468
481420
68291
299374
484070
102555
22685
80030
63173
256161
708771
135234
676007
128145
899584
340831
395706
515860
74273
451521
53006
2534
16657
592782
394363
60584
41796
82632
23890
37976
261107
4995
678712
75867
238924
559938
27559
409063
664683
878192
658120
108723
261771
383489
70345
76350
33822
333532
603326149616
204044
94403
167488
252104
755915
10445
83777
538833
20691
137429
65752
8411
5447
322962
30942
534854
512231
4558
480534
39144
238288
18675
163380
879725
161605
84891
751888
494527
123585
233746
86326
178273
307480
218923
760090
276465
407842
780579
321831
451565
751914
84285
192830
8511
22326
16705
236022
471370
39204
103835
70371
231763
31060
141030148943
14393
234138
268644
506433
498794
261350
172499
7913
811394
4572
20867
444808
804926
315792
477588
383383
457114
506869
712312
188833
66689
901540
526416
62982
498092
2462
147238
915894
355805
45504
18885
7075
551329
305610
605709
8655
258790
164308
738141
291286
7418
430560
483809
760741
492
62341
94703
561648
262641
846328
80380
118187
687770
6670
777175
893455
841816
304899
2580
2137
556969
397849
291354
91
510504
31280
55078
539191
55868
109119
852549
521950
880622
856663
64092
922367
800173
340248
78433
140070
344677
43648
78470
823945
21135
84624
914787
699025
23186
619159
317268
45168
25266
277156
64168
578218
374899
398200
64195
187077
838433
572556
255028
759289
626404
249006
82860
708728
37591
199462
416489
509275
418997
60281
353011
101109
541433
436986
525056
18026
389248
264079
2360
275210
11781
2827
396050
559161
281372
396062
117924
148255
6544
105249
659897
567089
803625
879402
62750
402224
274568
525573
656766
35634
54208
220638
537398
696224
80700
468141
826628
582469
170833
262995
341239
804482 764433
569283
411402
803672
109402
622735
29544
566122
859873
46675
31603418679
259428
168826
793468
390571
421555
80769
519042
345119
14010
37947
879501
537487
65771
299933
3425
71493
50081
42653
562083
48109
678831
211991
148404
801717
744806
347075
461765
144948
373718
711640
214751
259161
831565
548165
353103
615388
78820
11243
368699
73519
889340
5972
73280
208725
763538
31749
71509
3083
779284
818988
58394
105502
27683
842115849041
450738
144433
578685
304183
665008
62525
614845
70725
41372
60486
881736
812604
709271
760913
7709
29783
521432
302607
296006
105566
134178
569891
910436
268927
41023
384126
504961
545926
715916
158637
21647
103573
109726
628036
703656
195864
5289
722097
175966
156852
601276
455860
62668
14308
732371
77025
449763
847486
271911
75202
5478
438824
656629
605437
695560
187646
11520
480516
106294
56583
462089
851210
503562
180099
189998
222489
306232
757630
275741
498349
376099
689446
66860
7731
34103
462070
2955
18998
736587
591182
15702
377424
783865
560477
494737
105851
886141
339390
825237
3463
81770
583048
907385
790272
253335
43079
447903
529761
540021
5550
56751
42912
40387
185808
707855
136043
705443
708056
164606
18190
786395
250942
437756
1537
51798
1542
691734
2203912253
339487
43611
280329
785957
474664
530354
3632
22067
82433
54538
587348
153174
69436
77400
136796
608529
632424
460961
151153
27241
706169
722197
4073
601719
439993
91776
126596
140949
568982
165294
82543
737904
842556
159394
28326
573383
120492
32442
751903
522013
192191
52935
920699
145701
326563
79592
624049
759281
243498
871
6443
12036
771030
134817
787415
463491
618492
42783
55981
14937
42787
83752
81713
12109
675668
321365
242907
696152
818683
399204
42854
52369
22430
63122
889541
44910
114632
1913
122760
821117
649790
340634
491397
535873
744283
8855
487316
542069
36761
526235
383979
233429
776108
407469
795227
220492
782283
262093
761821
358383
712012
8251
752129
669581
265961
444415
Fig. 10. A visualisation of all users identified in Sect. V-A and all shortestpaths between the vertices representing those users and the vertex representingthe MyBitcoin service in the user network.
Fig. 11. An egocentric visualization of the thief in the imperfect user network.For this visualization, vertices are identified through their colors in the text,edges are colored according to the color of their sources and the size of eachvertex is proportional to its edge-betweeness within the egocentric network.
to LulzSec. This donation, the day before the theft, is his lastknown activity using these public-keys.
D. Flow and Temporal Analyses
In addition to visualizing egocentric networks with a fixedradius, we can follow significant flows of value through thenetwork over time. If a vertex representing a user receives alarge volume of Bitcoins relative to their estimated balance,and, shortly after, transfers a significant proportion of thoseBitcoins to another user, we deem this interesting. We built a
1 BTC17:34:04 13/06/2011
25000 BTC17:52:23 13/06/2011
0.31337 BTC17:45:31 13/06/2011
0.120607 BTC16:55:19 12/06/2011
0.11 BTC04:04:14 22/05/2011
0.09 BTC09:07:59 23/05/2011
60 transactions involving 441.83 BTC over a 70-day period
Thief
Victim
Time
Bitc
oins
Fig. 12. An interesting sub-network induced by the thief, the victim andthree other vertices. The notation is the same as in Fig. 11.
special purpose tool that, starting with a chosen vertex or setof vertices, traces significant flows of Bitcoins over time. Inpractice we have found this tool to be quite revealing whenanalyzing the user network.
Case Study – Part II: To demonstrate this tool we re-consider the Bitcoin theft described earlier. We note that thevictim has developed their own tool to generate an exhaustivelist of public-keys that have received some portion of the stolenBitcoins since the theft22. However, this list grows very quicklyand, at the time of writing, contained more than 34 100 public-keys. Figure 13 shows an annotated visualization producedusing our tool. We observe several interesting flows in theaftermath of the theft. The initial theft of a small volume of1 BTC is immediately followed by the theft of 25 000 BTC.This is represented as a dotted black line between the relevantvertices, magnified in the left inset.
In the left inset, we can see that the Bitcoins are shuffledbetween a small number of accounts and then transferredback to the initial account. After this shuffling step, we haveidentified four significant outflows of Bitcoins that began at19:49, 20:01, 20:13 and 20:55. Of particular interest are theoutflows that began at 20:55 (labeled as ‘1’ in both insets)and 20:13 (labeled as ‘2’ in both insets). These outflows passthrough several subsequent accounts over a period of severalhours. Flow 1 splits at the vertex labeled A in the right inset
22http://folk.uio.no/vegardno/allinvain-addresses.txt
1555008
23
38
35306
33302
51840
15785
46868
766
5174
22345
2168
83677
26206
14343
652298
112654
570264
185593
173148
843876
579825
694287
578938
810331
559124
83545
705069
88134
97665
289393
547864
728964
786485
40371
453627
785946
908544
109875
75322
211570
119835
432124
778645
658824
686108
718222
731679
415163
418447
194930
855203
417317
628267
792796
53054
594130
812614
835745
329167
301616
10289
572864
586163
721972
384811
285412
144498
314155
640488
882742
310331
416828
341813
422259
651852
275022
675125
211899
400979
541199
912716
903287
312557
2003
818275
339830
141809
19441
675457
146889
840296
761325
685007
730355
142970
30331
733417
598910
220492
290553
736402
245910
227589
175023
586588
393368
628889
647334
71852
764650
104741
358574
260814
750257
316873
907453
164555
576725
304592
22746
657716
892158
556457
312442
443268
875756
368532
560596
278942
132412
187689
799054119
891285
199029
612697
263646
715668
861610
3956
507270
603123
32240
112654
570264
185593
843876
586163
18:52
19:21
19:24
19:24
19:38
20:01 19:21
19:59
19:49
19:49
19:59
20:55
20:13
2
A
B
C3
4
694287
578938
810331
83545
705069
88134
785946
908544
109875
75322
211570
686108
341813
422259
312557
358574
560596
891285
A
B
C
31
4
4
06/13 20:55
06/13 22:15
06/14 04:05
06/14 04:05
06/14 04:05
06/14 04:08
06/16 03:42
06/16 13:37
06/16 13:44
06/16 13:44
06/16 13:44
06/16 13:44
06/16 13:37 06/16
03:42
2
12
Y
MyBitcoin
TheftReporter
AllegedThief
AllegedThief
TheftReporter
Fig. 13. Visualisation of Bitcoin flow from the alleged theft. The left inset shows the initial shuffling of Bitcoins among accounts close to that of the allegedthief, during which all transfers happen within a few hours of the incident. The right inset shows detail on the events of several subsequent days, whereBitcoin flows split, and then later merge back into each other, validating that the flows found by the tool are probably still controlled by a single party.
at 04:05 the day after the theft. Some of its Bitcoins rejoinFlow 2 at the vertex labeled B. This new combined flow islabeled as ‘3’ in the right inset. The remaining Bitcoins fromFlow 1 pass through several additional vertices in the next twodays. This flow is labeled as ‘4’ in the right inset.
A surprising event occurs on 16/06/2011 at approximately13:37. A small number of Bitcoins are transferred from Flow 3to a heretofore unseen public-key pk1.23 Approximately sevenminutes later, a small number of Bitcoins are transferred fromFlow 3 to another heretofore unseen public-key pk2.24 Finally,there are two simultaneous transfers from Flow 4 to twomore heretofore unseen public-keys: pk325 and pk4.26 We havedetermined that these four public-keys, pk1, pk2, pk3 and pk4– which receive Bitcoins from two separate flows that splitfrom each other two days previously – are all contracted tothe same user in our ancillary network. This user is representedas C in Fig. 13.
There are several other examples of interesting flow. Theflow labeled as Y involves the movement of Bitcoins throughthirty unique public-keys in a very short period of time. Ateach step, a small number of Bitcoins (typically 30 BTCwhich had a market value of approximately US$500 at thetime of the transactions) are siphoned off. The public-keys thatreceive the small number of Bitcoins are typically representedby small blue vertices due to their low volume and degree.On 20/06/2011 at 12:35, each of these public-keys makesa transfer to a public-key operated by the the MyBitcoin
231FKFiCYJSFqxT3zkZntHjfU47SvAzauZXN241FhYawPhWDvkZCJVBrDfQoo2qC3EuKtb94251MJZZmmSrQZ9NzeQt3hYP76oFC5dWAf2nD2612dJo17jcR78Uk1Ak5wfgyXtciU62MzcEc
service.27 Curiously, this public-key was previously involvedin another separate Bitcoin theft.28.
555008
23
38
35306
33302
51840
15785
46868
766
5174
22345
2168
83677
26206
14343
652298
112654
570264
185593
173148
843876
579825
694287
578938
810331
559124
83545
705069
88134
97665
728964
40371
908544
109875
75322
211570
119835
432124
778645
658824
686108
718222
731679
415163
418447
194930
855203
417317
53054
594130
329167
301616
10289
572864
586163
721972
384811
285412
314155
882742
310331
416828
341813
422259
651852
400979
903287
2003
19441
675457
146889
730355
142970
30331
733417
598910
220492
290553
736402
227589
393368
628889
71852
260814
750257
907453
164555
576725
304592
22746
657716
556457
312442
875756
368532
560596
278942
132412
187689
799054119
891285
715668
861610
3956
507270
603123
32240
Fig. 14. The Bitcoins are transferred between public-keys along thehighlighted paths very quickly.
We also observe that the Bitcoins in many of the aboveflows are transferred between public-keys very quickly. Fig. 14shows two flows in particular where the intermediate parties
271MAazCWMydsQB5ynYXqSGQDjNQMN3HFmEu28http://forum.bitcoin.org/index.php?topic=20427.0
waited for very few confirmations before re-sending the Bit-coins to other public-keys.
Naturally, much of this analysis is circumstantial. We cannotsay for certain whether or not these flows imply a sharedagency in both incidents. There is always the possibility ofdrawing false inferences. However, it does illustrate the powerof our tool when tracing the flow of Bitcoins and generatinghypotheses. It also suggests that a centralized service mayhave further details on the user(s) in control of the implicatedpublic-keys.
The same tool that we use here to examine outflows ofBitcoins from users of interest can also be used to examineinflows of Bitcoins to users and organizations accepting Bit-coins as payment or as donations.
E. Other Forms of Analysis
There are many other forms analysis that can be applied inorder to de-anonymize the workings of the Bitcoin system:
• Order books for Bitcoin exchanges are typically madeavailable to support trading tools. As orders are oftenplaced in Bitcoin values converted from other currencies,they often have a precise decimal value with eight sig-nificant digits. It may be possible to find transactionswith corresponding amounts and thus map public-keysand transactions to the exchanges.
• It is conceivable that over an extended time period,several public-keys, if used at particular times, maybelong to the same user. We could construct and cluster aco-occurrence network to help deduce further mappingsbetween public-keys and users.
• Finally, there are far more sophisticated forms of attackwhere the attacker actively participates in the network,for example, using marked Bitcoins or by operating alaundry service.
VI. CONCLUSIONS
For the past half-century futurists have heralded the adventof a cash-less society [24]. Many of their predictions have beenrealized, e.g. Anderson et al.’s [24]’s ‘on-line real-time’ pay-ment system and bank-maintained ‘central information files’.However, cash is still a competitive and relatively anonymousmeans of payment. Bitcoin is an electronic analog of cashin the online world. It is decentralized: there is no centralauthority responsible for the issuance of Bitcoins and there isno need to involve a trusted third-party when making onlinetransfers. However, this flexibility comes at a price: the entirehistory of Bitcoin transactions is publicly available. In thispaper we investigated the structure of two networks derivedfrom this dataset and their implications for user anonymity.
Using an appropriate network representation, it is possibleto map many users to public-keys. This is performed usinga passive analysis only. Active analyses, where an interestedparty can potentially deploy marked Bitcoins and collaboratingusers can discover even more information. We also believe thatlarge centralized services such as the exchanges and wallet
services are capable of identifying considerable portions ofuser activity.
Technical members of the Bitcoin community have cau-tioned that strong anonymity is not a prominent design goalof the Bitcoin system. However, casual users need to beaware of this, especially when sending Bitcoins to users andorganizations they would prefer not to be publicly associatedwith.
VII. ACKNOWLEDGEMENTS
This research was supported by Science Foundation Ireland(SFI) Grant No. 08/SRC/I1407: Clique: Graph and NetworkAnalysis Cluster. Both authors contributed equally to thiswork. It was performed independently of any industrial part-nership or collaboration of the Clique Cluster.
REFERENCES
[1] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System,” 2008.[Online]. Available: http://bitcoin.org/bitcoin.pdf
[2] C. Dwork and M. Naor, “Pricing via Processing or Combatting JunkMail,” in Proceedings of the 12th Annual International CryptologyConference on Advances in Cryptology (CRYPTO’92). Springer, 1992,pp. 139–147.
[3] A. Back, “Hashcash – A Denial of Service Counter-Measure,” 2002.[Online]. Available: http://www.hashcash.org/papers/hashcash.pdf
[4] The Economist, “Digital Curriencies – Bits and Bob,” June 2011.[Online]. Available: http://www.economist.com/node/18836780
[5] H. Cavusoglu, H. Cavusoglu, and S. Raghunathan, “Emerging Issues inResponsible Vulnerability Disclosure,” in Proceedings of the 4th AnnualWorkshop on Economics of Information Security (WEIS’05), 2005.
[6] K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis,“Tastes, Ties, and Time: A New Social Network Dataset using Face-book.com,” Social Networks, vol. 30, pp. 330–342, 2008.
[7] W. Dai, “B-Money Proposal,” 1998. [Online]. Available: http://www.weidai.com/bmoney.txt
[8] R. Fugger, “Money as IOUs in Social Trust NetworksA Proposal for a Decentralized Currency Network Protocol,” 2004. [On-line]. Available: http://www.ripple-project.org/decentralizedcurrency.pdf
[9] K. Saito, “i-WAT: The Internet WAT System – An Architecture forMaintaining Trust and Facilitating Peer-to-Peer Barter Relationships,”Ph.D. dissertation, Keio University, 2006.
[10] V. Vishnumurthy, S. Chandrakumar, and E. Sirer, “KARMA: ASecure Economic Framework for Peer-to-Peer Resource Sharing,”in Proceedings of the 1st Workshop on Economics of Peer-to-PeerSystems. [Online]. Available: http://www2.sims.berkeley.edu/research/conferences/p2pecon/index.html
[11] B. Yang and H. Garcia-Molin, “PPay: Micropayments for Peer-to-PeerSystems,” in Proceedings of the 10th ACM Conference on Computer andCommunication Security (CCS’03), V. Atluri and P. Liu, Eds. ACMPress, 2003, pp. 300–310.
[12] F. Stalder, “Failures and Successes: Notes on the Development ofElectronic Cash,” The Information Society (TIS), vol. 18, no. 3, pp.209–219, 2002.
[13] N. Kichiji and M. Nishibe, “Network Analyses of the Circulation Flowof Community Currency,” Evolutionary and Institutional EconomicsReview, vol. 4, no. 2, pp. 267–300, 2008.
[14] M. Nishibe, “Chiiki Tuka No Susume (in Japanese),” Hokkaido Shok-oukai Rengou, 2004.
[15] “Where’s George?” http://www.wheresgeorge.com.[16] D. Brockmann, L. Hufnagel, and T. Geisel, “The Scaling Laws of
Human Travel,” Nature, vol. 439, no. 26, pp. 462–465, 2006.[17] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou
r3579x?: Anonymized Social Networks, Hidden Patterns, and StructuralSteganography,” in Proceedings of the 16th International Conference onWorld Wide Web. ACM, 2007, pp. 181–190.
[18] D. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, andJ. Kleinberg, “Inferring Social Ties from Geographic Coincidences,”Proceedings of the National Academy of Sciences, vol. 107, no. 52, p.22436, 2010.
[19] A. Narayanan and V. Shmatikov, “Robust De-anonymization of LargeSparse Datasets,” in Proceedings of the 29th Symposium on Security andPrivacy. IEEE, 2008, pp. 111–125.
[20] R. Puzis, D. Yagil, Y. Elovici, and D. Braha, “Collaborative Attack onInternet Users’ Anonymity,” Internet Research, vol. 19, no. 1, pp. 60–77,2009.
[21] A. Clauset, C. Shalizi, and M. Newman, “Power-Law Distributions inEmpirical Data,” SIAM Review, vol. 51, no. 4, pp. 661–703, 2009.
[22] H. Simon, “On a Class of Skew Distribution Functions,” Biometrika,vol. 42, pp. 425–440, 1955.
[23] A. Barabasi and R. Albert, “Emergence of Scaling in Random Net-works,” Science, vol. 286, no. 5439, pp. 509–512, 1999.
[24] A. Anderson, D. Cannell, T. Gibbons, G. Grote, J. Henn, J. Kennedy,M. Muir, N. Potter, and R. Whitby, An Electronic Cash and CreditSystem. American Management Association, 1966.