Blockchain is in fashion. Every month, new systems of this type emerge, applications that use them or companies who base their business models on the opportunities provided by this technology.
Despite this boom, there is still a lot of uncertainty surrounding the protocol. There are doubts surrounding its scalability, its efficiency, in some cases the robustness of its underlying consensus mechanisms and also its privacy, an issue upon which we will focus here.
In INCIBE's study on Bitcoin we saw that this system does not provide anonymity but pseudo-anonymity. That is to say that even though the user can create as many accounts as they like, it is possible to interrelate them with a high degree of probability, unless the user takes significant steps (and, in all likelihood, assumes risks) to prevent it.
Given that the majority of current blockchains are based to a greater or lesser extend on Bitcoin, many of them have inherited this pseudo-anonymity. If, moreover, we add the property of immutability which they provide, we are looking at a dangerous combination. The immutability of a blockchain implies that, once information has been written in the blockchain, it will stay there forever. There are dangerous examples, such as information relating to malware or paedophiliac content, and curiously, but which also reflects this problem, an ASCII image of Ben Bernanke. The immutability of Bitcoin (where this information is written) guarantees that, like it or not, this information cannot be deleted.
This combination of immutability and pseudo-anonymity allows for an interesting analysis of the information contained in a typical blockchain. But, how easy or difficult can this task be? As an inoffensive example, below we'll use an after-work session we organised at BEEVA to play Connect4 with Ethereum smart contacts. The aim of this analysis is to recover the movements we made at the time, although some time has passed since that day.
But what is Ethereum? And a Smart Contract?
Ethereum is one of the main blockchains, excluding Bitcoin. It is also based on proof of work, that is to say, it requires network participants to spend computational capacity to collaborate (although they plan to change this in the future). But those that have done more than any other for the fame of this system are Turing complete smart contracts. In this context, a smart contract is no more than a program that is executed in a decentralized way. That is to say, all nodes of the network execute it and check that the result of the operations coincides with what has been written in the blockchain. The term "contract" comes from the fact that it was originally envisaged as a way to automate contracts (legal and financial) although smart contracts are valid for any type of processing.
In short, an Ethereum participant may invoke a smart contract which will be replicated in all nodes of the system and read the result of the operation invoked directly in the block chain, knowing that the result is correct (under the Ethereum security model, of course).
Crawling our Connect4 Ethereum
Ethereum has an important community of developers. Among other utilities, there are many models available for NodeJS which allow for tracing the blockchain in a simple way, block by block and transaction by transaction, processing the information that these contain. Specifically, we can see the sender and receiver of the transaction and the parameters of same and even execute it again. In Ethereum, the term "transaction" serves both for operations sending the internal cryptocurrency, the Ether, and invocation operations of a function of a smart contract. In the case of Ethereum, these smart contracts are executed in each node of the network, independently through what is known as the Ethereum Virtual Machine (EVM).
Nevertheless, in the case in question, there is a slight complication that also arises in the most interesting cases. In the scenario of the Connect4 game we built, both the board of the game and the players are modelled on smart contracts. This is to ensure that nobody can cheat the game, not even the owner of the supposed company that allows Connect4 to be played from Ethereum. Once the game has started, the players send instructions to the board with their movements. These instructions are sent in the form of a contract to contract transaction, that is to say the "player" smart contract decides the next move and sends the transaction to the "board" smart contract to apply the movement. This contract to contract call is reflected directly in the Ethereum blockchain, as it is what is known as an internal message. Basically, in order to save space, the nodes that comprise the network and execute the code process the call as if it were all part of the same program (although in reality they are executing smart contracts) and only store the final result. This is not a problem from the perspective of the coherence of the final result as the processing is deterministic and all the nodes of the network will reach the same result, and therefore it is not necessary to "take note" of the intermediary transactions.
However, looking at our analysis, given that precisely what interests us is seeing what interactions there have been between the players and the boards, this is a problem, as is it is no longer enough to simply process block by block, analysing the transactions of each of them, as these internal transactions do not explicitly appear.
Once again, Ethereum's community of developers comes to the rescue. Thanks to the implementations of the Ethereum Virtual Machine freely available, we can reproduce exactly the processing performed by the nodes of the network. The only requirement is to know the status of the network before executing each operation and the parameters of the operation itself. Considering that all this information is available directly from the blockchain, we can reproduce everything without any problems.
The flow is basically that which we have commented above: tracing the chain, block by block, and recovering the transactions within each block. In this case, however, we upload the status of the blockchain for the block in question within the EVM. Once uploaded, we instruct the EVM to execute the code of each transaction we've seen in this block. Although the NodeJS modules for Ethereum facilitate a lot of work, the code is still quite complex. But, basically, the flow is as follows:
Whereas within “runCode()”, we could proceed to capture the operations, what we are interested in for our analysis is capturing step type events (those that launch the EVM when it advances to the next opcode to execute).
By way of example, using a crawler of the type mentioned, we capture the calls between contracts on the blockchain we used during the Connect4 tournament. Below we show a fragment of the calls that took place.
Returning to the implications for privacy we can discover the final result with total precision: which player invoked which board, at what specific moment and with what parameters, being able to reproduce completely the Connect4 games that we played at BEEVA some months ago. Extrapolating this didactic example to more delicate scenarios, such as financial transactions, asset management, etc., the result could be quite troublesome. For example, on of the propositions of Slock.it (who were significantly affected by the hack of TheDAO), is to control smart locks through smart contracts in Ethereum (allowing for opening and closure of the lock based on who has paid the rental fee, for example, to create a type of decentralized Airbnb. In this case, by crawling Ethereum just as we have done, we could find out who opened or closed the lock, or even if there is someone in a house with these types of locks, or if the house is empty.
For this reason, blockchain proposals continue to pop up in which robust privacy is implemented form the start, such as ZCash, Enigma and Hawk. Nevertheless, no doubt more interesting issues lie ahead.