from hashing to bitcoin

From Hashing to Bitcoin

Encoding, Encrpytion and HashingEncoding, encrpytion and hashing are easily confused. All three of them transform data into another format, encoding and encryption are reversible (encoding vs decoding, while encryption vs decryption), while hashing is irreversible. The purpose of encoding is to transform data into a proper format for a system to consume (or to access), such as ASCII and base64, the scheme is publicly available, no key is needed to decode. The purpose of encryption is to transform data in order to keep it secret from others, so that it can only be consumed by targeted recipients, who can reverse the transformation with password (key). Hashing is a mapping (known as hashing function f) that map any input (usually a string or a serialized data structure) into a fixed size string (or a fixed size integer, or a fixed length byte stream), while fulfulling the following properties : (1) the same inputs give the same outputs, different inputs give different outputs, (2) reversing the transformation is impossible (i.e. we cannot find input given output, or f-1 is unknown) and (3) a minor change in input will result in a drastic change in output (known as the avalanche effect). There is no obvious pattern in the mapping, hashing function is like a pseudo random redistribution of the inputs, but of course, hashing function is nonstochastic (it is deterministic). Hashing an object means digesting the object with hashing function to get a hash value, i.e. serialized object is the input of hashing function, while hash value is the output of hashing function.

reversible key involved purposes encoding yes no format conversionencryption yes yes keeping secrethashing no yes / no (1) hash table, allow O(1) searching

(2) verifying file integrity (3) protecting password

cryptographic hashing no yes digital signature

Finally, we have cryptographic hashing, which is a combination of encryption and hashing, it is usually used as digital signature. Please do not confuse encryption with digital signature, a digital signature on a document cannot make the document confidential, the digital signature can only be used as an endorsement of the document, i.e. declaring that the document is approved by the signer. Given the credit of the signer, we can trust the document.

EncryptionCryptography involves encryption (from plaintext to ciphertext) and decryption (from ciphertext to plaintext). A key is needed, here the key means password, which is a different concept from the key in hash. Cryptography can be roughly divided into symmetric and asymmetric, the latter is known as public key cryptography. For symmetric cryptography, there is a private key, available only to the parties who share the secret, the same key is used for both encryption and decryption. For asymmetric cryptography, there are two keys : public key and private key, you can do encryption with either key and do decryption with another key (i.e. you cannot encrypt and decrypt with the same key). Consider a manager having the private key, publishes the public key to all his colleagues. All documents sent by colleagues should be encrypted with the public key, thus only the manager has the right to decrypt using his private key. In the other way round, when documents sent by the manager are encrypted with his private key, everyone able to decrypt using the public key, doing this way seems to be meaningless, but later we will see, this can be used together with hashing to generate digital signature, this application is not regarded as cryptography, it is known as cryptographic hashing, please read latter sections.

public key private key cryptography for encryption for decryptioncryptographic hashing for signing document for signature verification

HashingHashing function maps key into integers (or buckets). The hash function should distribute the keys as uniformly as possible to the buckets (i.e. output space), so that the output space is evenly used (in this context, key is the input to hash function, it is a different concept from with the key in encryption). Suppose hash function h(k) has bucket size M :

= = = …

i.e. =

Some examples of hash functions :

(1) =

(2) = known as Knuth multiplicative hash, c is irrational

=

1

The first example is applicable only when M is not a power of 2, otherwise if we allow M = 2n, then the hash function is a filter that simply select the n lower bits as the hash output. Prime number which is closed to a power of 2 is a good choice of M. The second example multiplies M with a fraction, which lies within [0,1], to create a floating point that lies within [0,M]. For efficient implementation, we can pick M to be power of 2 and Knuth suggests that the optimal value of c to be (5-1)/2 = 0.6180339887... (is there any proof ?). Lets take a look at the four different applications of hashing.

Application 1 – Hash tableA linear search in an unordered linear data structure, such as std::vector and std::list, has an efficiency of O(N), while a binary search in an ordered data structure, such as std::set and std::map, has a better efficiency of O(logN), which can be further improved to O(1) in another data structure, the hash table !! Hash table is an array of M buckets, with index m[1,M]. When an object is inserted into hash table, it should be hashed (with hash function) to get a bucket index (hash value), the object is then put into the bucket. Theoretically, when a good hash function is used, different objects have different hash values, however there is no guarantee. When multiple objects share the same hash value, collision occurs. To solve hash collusion, we can store a list of objects instead of one single object in each bucket, this method is called separate chaining. However, if collision happens too frequently, searching efficiency will be reduced. Examples of hash table include std::unordered_set and std::unordered_map.

Application 2 – File integritySince different objects have different hash values (for an ideal hashing function), besides, a minor change in the object will result in a drastic change in hash value (avalanche effect), hashing thus becomes a useful tool for verifying file integrity. When a file is hashed, it has a unique hash value, when it is corrupted, its hash value changes. If you download a file from a site, you can do integrity checking if the site publishes hash values together with download link. Program md5sum is provided in linux to perform the well known MD5 hashing. For example, in linux :

$ cat file1This is a very small file with a few characters.$ cat file2this is a very small file with a few characters.$ md5sum file1 file275cdbfeb70a06d42210938da88c42991 file16fbe37f1eea0f802bd792ea885cd03e2 file2

Application 3 – Protecting passwordSome people like to use the same password for multiple websites, it is a bad idea for websites to store their users’ passwords inside web servers in raw format, instead they should hash the passwords before storing in web servers. Passwords transferred in the network is in raw format, while passwords stored in servers are hashed, thus passwords read from the network should be firstly hashed before comparing with the hashed passwords in servers. Assuming that different servers use different hashing functions, we can avoid hackers from hacking users’ other accounts if their hashed passwords are stolen from one of the web servers (but what happens if hackers steal passwords directly from the network, rather than from the servers?).

Cryptographic hashingDigital signature is a combination of private key cryptography and hashing (please note that you cannot accomplish it with symmetric cryptography). The document to be signed is firstly hashed, so that it is transformed into an output with fixed size, called the message digest (note : document can be very large, while message digest has fixed size). The message digest is then encrypted using private key to generate a signature (i.e. with message digest as plaintext, and signature as ciphertext), the signature is then append to the raw document to form a signed document. Please note : (1) the holder of private key is the only who can make the signature and (2) the document is not encrypted, everyone can read the document, they just don’t know whether the document is reliable, unless document is endorsed by some authorities. Here is the signing algorithm :

raw document message digested signed document

How can we verify whether a document is signed from the signed document and public key only? Firstly, the signed document is partitioned into a signature and a raw document, the signature is then decrypted with public key, which is then compared with the message digest generated by hashing the raw document. If they are equivalent, then we can claim that the document is signed. Here is the signature verification algorithm :

2

Please note that the size of hash value is fixed (i.e. number of buckets is constant), which is 2324 in this example.

signaturehashing

encrypt with private key signature

append to the raw document

signature

hashing

decrypt with public key

signature

If they are equivalent, then the document is signed.

Practical public key cryptography and hashing include :

algorithm application MD5 hashing linux command md5sumSHA256hashing bitcoin blockchain’s construction (also called bitcoin mining) ECDSA public key cryptography bitcoin transaction’s digital signatureRSA public key cryptography HKEX’s Orion open gateway (use openssl library)

Elliptic curve digital signature algorithm (ECDSA)Now lets take a look at how elliptic curve digital signature algorithm generate a pair of public key and private key. It involves two mathematical concepts (1) elliptic curve and (2) finite field arithmetic, the latter requires number theory, please note that this section is just a simple introduction to ECDSA while skipping complicated number theory, thus the mathematical treatment in this section is not vigorous enough. First of all, an elliptic curve is defined as :

=

For bitcoin, we have a=0 and b=7, which look likes this :

It has several useful properties : (1) it is symmetric about x axis (the proof is easy), (2) any non vertical straight line y=mx+c intersecting the elliptic curve at two non tangent points, will always intersect a third point on the curve and (3) any non vertical straight line y=mx+c tangent to the elliptic curve at one point, will intersect precisely one other point on the curve (how can we prove property 2 and 3). Lets consider the following system of equations :

= which is the elliptic curve

= which is the non vertical line

=

= (equation 1)

Therefore property 2 and 3 can be combined as this statement : cubic equation 1 either has

0 real root, i.e. 3 imaginary roots or1 real root, i.e. 2 imaginary roots or3 real roots, among which, two of them may be the same.

With property 2 and 3, we can define point addition (LHS figure) and point doubling (RHS figure). Point addition P+Q = R of two points P and Q lying on the elliptic curve is defined as the reflection through x-axis of the third intersecting point R’ between the curve and the straight line joining P and Q, while point doubling P+P = R of a point P lying on the elliptic curve is defined as the reflection through x-axis of the intersecting point R’ between the curve and the tangent at P.

3

With point addition and point doubling, we can define point multiplication as :

=

=

=

=

or = where

, suppose

=

= this procedure can be repeated

(equation 2)

Lets find the intersection R’=(rx,-ry) and its reflection R=(rx,ry) given point P=(px,py) and Q=(qx,qy) for point addition.

= as R’ must lie on the line PQ

where =

and suppose the line joining PQ is y=mx+c, its intersection with elliptic curve can be obtained solving equation 1.

= by comparing the quadratic term

Thus we have the reflection of intersection :

=

= where (equation 3a)

Lets find the intersection R’=(rx,-ry) and its reflection R=(rx,ry) given point P=(px,py) for point doubling. All the above are still valid, only except for the value of m, we need to find by taking derivative of the elliptic curve.

=

=

=

Thus we have the intersection :

=

= where (equation

3b)

Now, lets introduce the finite field. In the context of ECDSA, finite field can be regarded as a predefined set of positive integers within which every calculation must fall (here calculation includes addition, subtraction,

4

multiplication and division). However, elliptic curve is a continuous curve in 2, how can we transform a floating point coordinate pair into an integer pair that lies within a range (0 x< M and 0 y< M)? It involves rational number and mod operation (any number lying outside range can be wrapped around by mod operation).

More about field – a set of numbers form a field if addition, subtraction, multiplication and division among numbers in the set return a number in the same set. For example, rational numbers form a field, because all operations of rational numbers result in rational numbers (you can prove it easily by declaring rational number r = p/q, where p and q are integers), similarly real numbers form a field, complex numbers form a field, yet integers do not form a field, as integer divided by integer may result in floating point. Please also note the coordinate of P, Q and R in the elliptic curve, if P and Q have rational coordinates, the coordinates of R is rational too (I am not sure whether this is related to the finite field in ECDSA). Now, ignoring the details of finite field, this is the elliptic curve in finite field with modulo 67 :

Please note the following. (1) As elliptic curve is symmetric in continuous field, it must be symmetric in finite field, but the axis of symmetry shifts to y = 67/2, since reflection –y mod 67 = (67-y) mod 67, for example, reflection of 34 is -34 mod 67, which is 33. (2) When we plot infinite long stline in the finite field, it will wrap around when it reaches either x=67 or y=67, please see how the line PQ wrap around in RHS figure. (3) The points on LHS figures form a finite field, as operations of the points (i.e. point addition, point doubling and point multiplication) return a point that belongs to the same set. (4) Point lying on the elliptic curve can be solely determined by x coordinate, as its y coordinate (and its reflection’s y coordinate) can be found by : y = (x3+ax+b). Hence intersection R’ can be easily found : extend stline PQ (wrap around if necessary) until it reaches x = m2-(px+qx) according to equation 3a, which is x=47 in the above example. The ECDSA protocol is uniquely defined by the following set of parameters :

elliptic curve parameters a and b prime modulo M base point P order N

Public key cryptography then involves point multiplication Pn, where P lies on elliptic curve with parameters a and b while n[1,N]. For bitcoin, all the parameters are very enormous numbers which make brute force reverse engineering impossible. Bitcoin uses elliptic curve y2 = x3 + 7, while

prime modulo = 2256 – 232 – 29 – 28 – 27 – 26 – 24 - 1 = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

FFFFFFFF FFFFFFFF FFFFFFFE FFFFFC2Fbase point = 0479BE667E F9DCBBAC 55A06295 CE870B07

029BFCDB 2DCE28D9 59F2815B 16F81798483ADA77 26A3C465 5DA4FBFC 0E1108A8 FD17B448 A6855419 9C47D08F FB10D4B8

order = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE BAAEDCE6 AF48A03B BFD25E8C D0364141

Now lets see how we can generate a private – public key pair. Private key is just a random number chosen in between 1 and N, then public key is derived from point multiplication : public_key = private_key base_point, which can be implemented by equation 2 for better efficiency. Thus given private key, we can generate public key, but not the other way round, this is a one way trip. Since a point on elliptic curve finite field can be determined solely by x coordinate, the public key can be compressed by storing the x coordinate only (of course you also need to record which side it lies : original side vs reflection). Now, I am going to skip the signing procedure and signature verification procedure.

For more details about finite field arithmetic, please refer to the book Cryptography and Security in Computing by InTech, particularly chapter 6. Among those public key cryptography, RSA is a good algorithm to start with (it involves the following concepts only : prime number, greatest common divisor, congruence and Euler’s phi function), for more details about RSA, please refer to the website Number Theory and the RSA Public Key Cryptosystem. For more details about ECDSA, please refer to the website Maths behind bitboin by Eric Rykwalder.

Payment Payment is a separate process from trading, because payment can be very slow, it involves a lot procedures to ensure a safe transfer of money (risk management), while trading can be very fast, like those in high frequency trading (hence there exists a settlement step which handles payment separately). VISA handles 2000 transactions

5

P

Q

R’

R

Note : Base point should be a coordinate pair, we somehow combine the x and y coordinate, then convert it to a byte stream.

http://www.coindesk.com/math-behind-bitcoin

http://doc.sagemath.org/html/en/thematic_tutorials/numtheory_rsa.html

http://doc.sagemath.org/html/en/thematic_tutorials/numtheory_rsa.html

http://www.intechopen.com/books/cryptography-and-security-in-computing

per second (tps) on average, with peak capacity at 56000 tps, while Paypal handles 115 tps on average. Nowadays bitcoin handles 7tps on average. Therefore scalability is an issue for bitcoin.

Money serves three purposes : (1) payment, (2) storage of values and (3) accounting (like calculating GDP). Payment is means money transfer, while money supply M1 includes currency, deposit and credit. Traditionally, payment is done in a centralized way, which means there exists a financial institution as an intermediator. Suppose A gives B a cheque with a unique serial number, B requires a centralized financial institution’s help to ensure two things before he can accept the payment : (1) A does have the ownership of the cheque (i.e. A has the right to spend it) and (2) A has not spent the money before (known as double spending). Both problems can be solved easily either by (1) going through an intermediator or (2) using physical currency (central bank centralizes legal tender printing). How about a decentralized world?

Introduction to bitcoin networkOwning a bitcoin does not mean owning an encryted bitcoin file, a transaction does not mean passing that file around, instead owning a bitcoin means you have the right to spend it (or transfer it to someone) by broadcasting a transaction message in the bitcoin network, which will create a transaction record in a distributed ledger, known as blockchain, after bitcoin network’s verification (or more precisely, reaching consensus by the nodes in bitcoin network). The blockchain is a publicly available ledger (like an accounting book), it is a record of all transactions in the entire bitcoin history. Blockchain records bitcoin transactions only, it does not record bitcoin balance for each account (it is user’s responsibility to work on his own balance), and of course, we can work out the balance of all accounts given the entire blockchain. In bitcoin protocol, blockchain is a chain (or a list) of blocks, to be more precise, it is a “very linear” tree of blocks, while block is a group of transaction records. Thus blockchain, block and transaction record are the three most fundalmental concepts in bitcoin protocol.

Each participant is a node in the bitcoin network. There are three types of nodes, (1) miners who help to manage the ledger while earning new bitcoins and transaction fee in return, (2) monitor who provide monitoring service over the bitcoin network, such as blockchain.info, which publishes a lot of realtime statistics such as turnover, total market capitalization and block informations, and finally (3) users who use bitcoin for payment. Users need to run a software, or mobile apps, known as the wallet, for generating transaction messages, making signature and checking whether a transaction is confirmed by the bitcoin network. The bitcoin network is governed solely by bitcoin protocol, which is simply a set of rules and message definition. There is no centralized bitcoin server, then how does bitcoin protocol verify bitcoin ownership and address double spending problem? The short answer is, bitcoin protocol verifies bitcoin ownership using ECDSA and addresses double spending problem by blockchain, which is constructed through voting by miners with their computation power, involving in numerous SHA256 hashing. Here are some major miners.

by blockchain.info 1st Aug 2015

TransactionTransaction is the core of bitcoin protocol. Suppose that you want to transfer an amount of bitcoins to someone, firstly you should generate a transaction message (including your signature), and broadcast it to bitcoin network, then each node in the network will verify this transaction (i.e. whether you have the right to spend the bitcoins). A transaction message records (1) the single source (or multiple sources) from which you get the bitcoins, which is known as input, (2) the amount of bitcoins and the destination, which is known as output, and (3) your own ECDSA signature together with the corresponding public key (ECDSA signature means : appending input with output, which is then signed with ECDSA private key). [Transaction message format described here is just for illustration only, it is different from the exact protocol, for details, please refer to the bitcoin specifications]. Input source is specified by transaction id, output destination is specified by the address of the receiver, but wait … how can we generate transaction id and address? Transaction id is generated by hashing (by default, we assume SHA256 is used) the transaction. Thus we expect that miners are responsible for building a std::map<transaction_id, transaction> whenever they receive a new broadcasted transaction message from the network. This is what miners do :

void new_Tx_received(std::map<Tx_id, Tx>& pending_Tx, const Tx& Tx) {

Tx_id Tx_id = SHA256.hash(Tx); // (1) Tx stands for transaction.pending_Tx[Tx_id] = Tx; // (2) Pending Tx are unconfirmed transactions.

}

6

Transaction message does not contain its own transaction id. With the above routine, all old transactions, no matter whether they are confirmed or not, can be retrieved from the map using their id. Unlike transaction id, bitcoin address is generated from the ECDSA public key by :

ECDSA.public_key = ECDSA.private_key * ECDSA.base_point; // Recall ECDSAaddress = base58.encode(RIPEMD160.hash(SHA256.hash(ECDSA.public_key)));

We can generate address from public key, but not the other way round. Suppose miners receive a transaction message from the network, there is a problem with the above routines : they simply accept all transactions without checking whether the sender has the right to spend the bitcoins he claims he owns. A transaction means granting private key holder of the address specified in the destination (i.e. output field in transaction message) the right to spend a certain amount of bitcoins, therefore miners can verify bitcoins’ ownership through two steps : (1) verify the source of bitcoins (i.e. input field in transaction message) and (2) verify if the sender has the required signature.

bool verify_ownership(const std::map<Tx_id, Tx>& pending_Tx, const Tx& Tx){

// step 1 : verify whether the source is valid address = base58.encode(RIPEMD160.hash(SHA256.hash(Tx.public_key)));Tx source_Tx = pending_Tx[Tx.input.source_Tx_id];if (source_Tx.output.destination_address != address) return false;

// step 2 : verify whether the signature is validECDSA::verify_signature(Tx.signature, Tx.public_key);

}

Furthermore, if users want to cache the bitcoins they receive from network, it can be done by comparing destination address in transaction message with their own. Therefore users can derive their balance from the entire bitcoin history.

void wallet::new_Tx_received(const Tx& Tx) {

wallet.address = base58.encode(RIPEMD160.hash(SHA256.hash(wallet.public_key)));if (Tx.output.destination_address == wallet.address){ Tx_id Tx_id = SHA256.hash(Tx);

wallet.incoming_Tx[Tx_id] = Tx;wallet.balance += Tx.output.amount;

}}

void wallet::new_Tx_transferred(Tx& Tx) {

Tx.input.source_Tx_id = ... ; // fill this pleaseTx.output.destination_address = ... ; // fill this pleaseTx.output.amount = ... ; // fill this pleasewallet.balance -= Tx.output.amount;

}

In general, one transaction supports multiple inputs and multiple outputs, which means, we can group all bitcoins we received from different sources, spend the sum by distributing to different destinations, so that the amount of bitcoins in the input and ouput conserves, in other words, it allows merging and splitting of value. One of the outputs can be your own address for collecting changes. Please note that, you need the specify the output channel in the source. For example, suppose my own address is F452EA90 :

void wallet::new_Tx_transferred(Tx& Tx) {

Tx.input[0].source_Tx_id = (D56A83B1,6); // transaction D56A83B1, output 6Tx.input[1].source_Tx_id = (D56A83B2,2); // transaction D56A83B2, output 2Tx.input[2].source_Tx_id = (D56A83B3,3); // transaction D56A83B3, output 3Tx.output[0].destination_address = F452EA90; // suppose wallet.address = F452EA90 Tx.output[1].destination_address = F452EA91;Tx.output[2].destination_address = F452EA92;Tx.output[0].amount = 5; // change = (10+20+30)–(40+15)–transact_feeTx.output[1].amount = 40;Tx.output[2].amount = 15;wallet.balance -=60; // suppose hash of this tranaction is D56A83BB

}

All transactions form a directed cyclic graph G={V,E}, where vertex vnV denotes an account (with unique address, private-public key pair) and directed edge en,mE denotes a transaction from vertex vn to vertex vm. Here is the directed acyclic graph for the above multi-inputs multi-outputs example, please note that time propagates along directed edges.

`Tx-D56A83B1 Tx-D56A83BB

7

account-A output[6].address = F452EA90 input[0].source_Tx_id = (D56A83B1,6)output[6].amount = 10 input[1].source_Tx_id = (D56A83B2,2)

input[0].source_Tx_id = (D56A83B3,3) account that canTx-D56A83B2 output[0].address = F452EA90 make signature

account-B output[2].address = F452EA90 my account that output[1].address = F452EA91 for addr F452EA91output[2].amount = 20 can make signature output[2].address = F452EA92

for addr F452EA90 output[0].amount = 5 account that canTx-D56A83B3 output[1].amount = 40 make signature

account-C output[3].address = F452EA90 output[2].amount = 15 for addr F452EA92 output[3].amount = 30

Please note the following. (1) Transactions D56A83B1, …B2 and …B3 all have multiple outputs, though they are not plotted in the above graph, we can imagine that it is a very complicated graph. (2) Transaction id is not included in the transaction message, it is generated through hashing by miners and users. (3) Address of sender is not included in the transcation message, it is redundant, as it can be traced out like step 1 in routine verify_ownership, lets recall :

address get_sender_address(const Tx& Tx){

return pending_Tx[Tx.input.source_Tx_id.first].output[Tx.input.source_Tx_id.second].destination_address;

}

When a new bitcoin is generated as a reward for a miner, it is also represented as a transaction, which has no input source. The new bitcoin is called a coinbase.

void new_bitcoin_for_rewarding_miner(Tx& Tx) {

Tx.input[0].source_Tx_id = COINBASE;Tx.output[0].destination_address = miner’s address; Tx.output[0].amount = rewarding_amount;

}Lets summarise what we have got at this moment. At the core of bitcoin is a distributed ledger of all transactions, from which the current balance of each account can be derived. A transaction is simply a message that instructs ledger to debit sender address and credit receiver address, the transaction must be signed with sender’s private key. With routine verify_ownership, no one can spend bitcoins that are not owned by themselves, only private key holder of “destination address specified in source’s transaction” has the right to spend. However, it is still possible for the user to broadcast false transactions by double spending, i.e. a user really owns some bitcoins, but he spends it twice, in other words, he creates money. Thus bitcoin should have some mechanisms to prevent double spending, otherwise it will result in hyperinflation, and destroy the currency eventually.

BlockchainLets firstly introduce block and blockchain, then we will see how double spending can be executed and how it can be prevented by blockchain. A block is a collection of transactions, there is no retrictions on the number of transactions per block (please check bitcoin’s specification). When a miner keeps receiving broadcast transaction messages, he can start building blocks in parallel. You can imagine that a miner running a process with multithreads, one thread keeps receiving transactions and appends them into a map of pending transactions, while another thread builds block from the map, the map is thus the common resource shared between these two threads (single producer single consumer model). Later we will see that this is in fact a process with at least three threads. Before introducing the block content, lets see what is a Merkle tree, which is also known as a hash tree.

Merkle root where L=label, hash()=hash_functionL=hash(L0+L1)

L0=hash(L00+L01) L1=hash(L10+L11)

L00=hash(data0) L01=hash(data1) L10=hash(data2) L11=hash(data3)

data0 data1 data2 data3

A Merkle tree is a tree in which every non-leaf node is labelled with the hash value of the concatenated labels of all its children nodes, while every leaf node is labelled with the hash value of a data. SHA256 is used as the hash function in bitcoin. A block is consisted of a block header and a block body :

block header = Merkle root + hash value of previous block (parent block) + nonceblock body = Merkle tree

All blocks concatenate to form a linked list (or a tree to be precise, but a rather linear one), known as the blockchain. Each block points to its previous block (or parent block) with the hash value of previous block. Thus if a miner wants to search an old block efficiently, it should build a std::map<hash_block, block>. Nonce is just a random number. A block is considered to be valid if hash value of the block header is within a certain threshold, i.e.

8

hash(block.header) < threshold, or equivalently, the hash value in binary or heximal format, starts with a certain number of zeros, such as :

000000000000002e9067f1cf7252333f7aeb619c89d220985a70ac0e015248e0

To construct a valid block given a map of pending transactions, miners should build the Merkle tree and search for a nonce value, that makes a valid hash value. This process is done by brute force, it takes time, and thus it is known as mining (or proof of work). Difficulty of mining depends on the threshold, which is adjusted by bitcoin protocol from time to time so that it keeps a nearly constant growth rate of blockchain roughly at 1 new block per 10 minutes. When a miner completes a block, he should then (1) broadcast the block to the network and (2) removes all transactions that constitute the block from the map of pending transactions that he maintains (of course, he cant modify other miners’ map of pending transactions). All miners should compete to find the next valid block, the winner is rewarded with (1) new bitcoins called coinbases and (2) transaction fee for all transactions in the completed valid block.

When a miner receives a broadcast message of the next block while he is working on that block (i.e. someone is faster than him in finding the nonce value and earns the coinbases), he should firstly verify if the received block is valid by checking all hash values in block header and block body (this is fast as the most time consuming calculation is brute force search for nonce value, which is now found), if it is valid, he can insert the received block into his blockchain, with insersion location specified by the block in the field “hash value of previous block”. Therefore, insersion does not necessarily happen at the end of blockchain, instead it may happen in the middle, which results in branches. Thus the term blockchain is a little bit confusing, because it is in fact a tree. After that, he can either : (1) keeps on working his block until it is finished, and broadcasts it, in this case, he is introducing branches in the blockchain (as there are multiple broadcasted blocks sharing the same parent block) or (2) abandones the working block, starts working after the received block (i.e. works on a new block using the received block as the parent block), but before that, he should update his map of pending transactions by removing all transactions included in the received block. Miner can choose between these two options based on his logics (or even in a random fashion), implementation is really up to the miner, as long as he can maximise his profit.

There are still a lot of unanswered questions. (1) Do miners maintain the same map of pending transactions? (2) Do miners maintain the same blockchain? Is there any official version blockchain (or ground truth)? (3) As blockchain is a tree, there are multiple leaf-nodes or leaf-blocks, so when miners build a new block, to which previous block should it point to? (4) Is there any limit on the number of pending transactions? Can a miner build a block with no transaction? (5) Are miners looking for the same nonce? (6) As there are multiple branches, how do we know the real transaction history? Lets find address them one by one.

First of all, bitcoin network is lossy. Some broadcast transaction messages and some broadcast completed blocks may be dropped, some miners may miss certain transaction messages or certain completed blocks. Bitcoin protocol should tolerate the loss and recover the truth of whole transaction history as blockchain grows. Thus each miner may own a different map of pending transactions and also a different version of blockchain. As there exists no centralized server, no one knows the so called “ground truth” of blockchain. As shown in the following example, LHS and RHS are slightly different versions of blockchain maintained by two miners, each square denotes a valid completed block received from the network. Although there exists no officially recorded transaction history, miners do come to consensus about the real historical path (known as the trunk, as indicated by black solid line). It is not 100% accurate, but its likelihood increases as both blockchains grow. Besides, we are more confident about the front end of the trunk, while uncertain about the back end of the trunk.

Secondly, given a blockchain tree, a miner can build a new block using any existing block as the parent block. If the miner choose to point to a leaf-block, then he is extending the trunk or the branch that the leaf-block lies, if the miner choose to point to non-leaf-node, he is introducing new branches in the blockchain. Besides, there is no retriction on the transactions that a miner puts in a new block, he can either put many transactions into the block, hoping to earn more transaction fee, or starts block building without waiting for more pending transactions, hoping to complete brute force search as soon as he can, this is up to his strategy. Statistics show that the average number of transactions per block is around 200-300. Besides, each miner must include a transaction that transfer coinbase into his own address in the Merkle tree, this serves as a reward for the miner, which forms the source of new bitcoins.

Thirdly, each miner are looking for a different nonce value, this is because of 3 reasons. Each miner builds the new block with (1) a different subset of pending transactions, (2) a different parent block and (3) a coinbase transaction to a different address. Due to avalanche effect of hashing, any minor change in the transactions will result in a drastic change in Merkle tree and hence a complete different nonce value. Hence all miners are searching for a different valid nonce value. Winning is thus completely random (probably uniformly distributed), chance of winning

9

miner 1 miner 2

the trunk

the next block is proportional to a miner’s computational power. For example, a miner having 10% of computation power of the whole bitcoin network will have 10% chance of winning the next block. Therefore the chance of winning consecutively by the same miner is low, even if he is the most powerful one. This prevents hackers from manipulating the blockchain.

Finally, we can see how the whole thing works. There is no centralized blockchain. Miners do not communicate. Each miner keeps his own version of blockchain, although they are different, they are overlapping. The longest overlapping path is known as the trunk. The front end of the trunk is relatively stable, while the back end of the trunk is still fuzzy as blockchains in all miners grow. We will see that when a block lies more than six-blocks deep inside the trunk, it can be considered to be stable, all transactions in that block (or prior to that block) can be considered to be confirmed, thus receivers of confirmed transactions can then spend their bitcoins (question : block is generated at the rate of 1 per 10 mintues, receivers of bitcoins need to wait for an hour so that their transactions can sink to 6-blocks deep in the trunk before they can spend their bitcoins, is that right?) . Please note that the trunk contains no leaf-block, except near the back end.

Besides, given an entire blockchain with multiple leaf-blocks, when we transverse the tree starting from root block to each leaf-block via a different path, we need to update the pending transactions independently (for different paths). In other words, each leaf-block should own an individual map of pending transactions (while non-leaf-blocks do not). However, as blockchain grows, number of leaf-blocks increases, miners need to manage increasing number of pending transaction map, which is infeasible. Therefore miners should stop managing pending transaction map for confirmed portion of the blockchain.

What pending transactions should a miner pick in his new block? How should he choose the parent block? He should choose in a way so that his completed block has a higher probability to fall into the trunk (in case if he is the lucky one who wins the next block, now you know, winning a block is purely a random event), so that he can earn both the coinbases and transaction fee. A block can become a block in the trunk if it is followed by many latter blocks, the more followers it has, the higher probability it is in the trunk. Therefore this is a voting, a voting by computational power. If other miners trust your broadcasted block, they will vote by investing their computational power in building new blocks behind yours (i.e. using your block as parent block). Therefore, what a miner choose to include in his new block are those that make other miners vote him : to be a honest miner, pick true (verified) transactions into his new block, and use the most trustable leaf-block as the parent block. This is how bitcoin protocol encourage miners to work the trunk honestly in a collective way.

The miner should be implemented with at least 3 threads : thread 1 – receive broadcast message of transaction, update pending transaction map for leaf-blocks, thread 2 – receive broadcast message of blocks, verify and insert them into blockchain, and thread 3 – with some logics, pick a leaf-block and build a new block after it.

How can blockchain avoid missing transactions? Suppose Tx10 Tx18 are pending transactions, some are missing in some blocks, different miners try to broadcast new blocks with pending transactions. This is how blockchain recovers the missing part. We denote the trunk in red, and parent blocks by brackets.

blk_A blk_B(A) blk_D(B) blk_F(E)Tx:10,14 Tx:11,13 Tx:12,15 Tx:17,18

blk_C(A) blk_E(C) blk_G(E) blk_I(G)Tx:15 Tx:16,13 Tx:17 Tx:18

blk_H(E) blk_J(H)Tx:11,12 Tx:17,18

How to prevent double spending?Can a miner steal coinbases by copying an existing block in blockchain, and modifying only the coinbase transaction output in order to transfer all coinbases to his own address, then broadcasting the block as if a newly found block by reusing the nonce found by others? The answer is no, because once any content of the block changes, he needs to rework nonce value by brute force again. Now we know that (1) a bitcoin miner cannot steal an existing block, (2) a bitcoin user cannot steal a transaction. The remaining problem that bitcoin needs to address is doubling spending, which means bitcoin owner broadcasts two transaction messages, sharing the same input source of bitcoin. There are three possible cases.

Case 1, a miner (carelessly or deliberately) puts these two transactions into the same block and broadcasts the block, this block will not pass the verification by other miners, hence they do not vote this invalid block by following another branch. Case 2, two miners, each of them see either one transaction only, broadcast their new valid blocks (each contains one of the duplicated transactions) to the network. Now suppose the two blocks share the same parent block, thus creating branches in the blockchain, other miners will vote either one branch by following their favourable one. The trunk will eventually transverse through either one of them only as the blockchain grows. As a result, double spending is avoided, miners will pick one of them through collective decision, while the other transaction is considered to be unconfirmed. Suppose Tx13 and Tx14 are double spending :

blk_A blk_B(A) blk_D(B) blk_F(D) Tx13 = A sends bitcoins to B.Tx:10,12 Tx:11,13 Tx:15 Tx:17,18 Tx14 = A sends bitcoins to C.

blk_C(A) blk_E(C) blk_G(D) blk_I(G)Tx:14 Tx:16,13 Tx:17 Tx:16,18

10

If you are a miner building a new block, which block would you like to follow : block I or J? Block J of course, this is how the missing part is recovered! Besides, the order of transactions is the decision of the trunk, rather the actual time when user broadcasts the transaction.

In case 2, if A double spends, the network will pick either one of them only, avoiding double spending. The chance for bitcoins going to the hands of B or C is 50-50, as a result, A cannot control how he spends.

blk_H(E) blk_J(H)Tx:11,15 Tx:17,18

In the example above, a honest miner will not generate block E, as he should have detected double spending (once in block C and once in block E), similarly, no honest miner will follow block E as it is invalid. Thus here comes case 3, the only way a fraudulent user can double spend is to build the blocks C, E, H and J all by himself, he needs to mine all the nonce values and broadcast the whole fake path. However, winning a block is a random event, the chance of winning successive blocks by the same hacker with limited computational power is very low. By the time the hacker solves his first block, the network would probably completed next few blocks, and he can never catch-up.

This is a race between honest chain (BDGI) and attacker chain (CEHJ). The block on which branching starts is treated as the reference point (block B or block C), and let the current progress of the honest miner and the attacker be x and y respectively, then the difference in progress m = x-y can be modelled as a Bernoulli random walk.

honest chain B D G Ihonest miner progress (x) 0 1 2 3

attacker chain C E H Jattacker progress (y) 0 1 2 3

This is analogous to Gambler’s ruin problem. Let the probability that honest miner wins the next block be p (m is then incremented by 1), while the probability that attacker wins the next block be q = 1-p (m is then decremented by 1). The honest miner is now m blocks faster than the attacker, probability that the attacker will catch up from behind is given by equation 2 in “Gamblers ruin.doc” as :

=

Suppose now user B has received Tx13, our objective is to find x such that Tx13 can be confirmed and user B is safe to spend the bitcoins. This is accomplished by solving for x such that prob(unsafe|x) is smaller than a predefined threshold. Given no extra information, both x and y follow Poisson distribution.

~

~

Since the expected progress is directly proportional to the successful probability, we have :

=

= where

= by law of total

probability

=

=

= always assume p

>q

= avoid

summation to infinity

=

Lets recall the law of total probability.

=

=

Here is an implementation in C++.

double unsafe_probability(double p, unsigned short x){

11

double q = 1-p;double lambda = x*(q/p);double sum = 1;for(unsigned short y=0; y<x; ++y){

double poisson = exp(-lambda); for(unsigned short k=1; k<=y; ++k) poisson *= (lambda/k);sum += (pow(q/p, x-y)-1) * poisson;

}return sum;

}

Running the function with q=0.1, we can see that unsafe probability drops off exponentially.

x prob(unsafe|x)0 1.00000001 0.20458732 0.05097793 0.01317224 0.00345525 0.00091376 0.0002428 (chance of successful attack < 0.01% for x=6)7 0.00006478 0.00001739 0.000004610 0.0000012

ConclusionWe have known for decades, there are scientific proofs that it is impossible to coordinate the exact information among multiple distant nodes in a network without a central authority (this is not limited to the context of currency). In 2008, Satoshi Nakamoto, published a paper with a practical solution to this impossible problem. All new transactions will be kept inside a block, which is periodically sealed, and insered into a blockchain. Every nodes in the network has its own version of blockchain. The trunk can be found when nodes reach consensus, This is a voting with computational power. This is why the blockchain is the most important invention in bitcoin.

ReferenceBitcoin : A Peer-to-Peer Electronic Cash System, Satoshi Nakamoto, 2008.Bitcoin Mining Explained Like You’re FiveBitcoin transaction fees explained

12

http://bitcoinfees.com/

https://chrispacia.wordpress.com/2013/09/02/bitcoin-mining-explained-like-youre-five-part-1-incentives/

from hashing to bitcoin

Documents