Why private blockchains should not be executing code
I’m not a fan of the term “smart contracts”. For a start, it has been used by so many people for so many different things, that we should probably just ban it completely. For example, the first known reference is from 1997, when Nick Szabo used it to describe physical objects that change their behavior based on some data. More recently, the term has been used for the exact opposite: to describe computation on a blockchain which is influenced by external events such as the weather. For now let’s put both of these meanings aside.
I want to focus here on “smart contracts” in the sense of general purpose computation that takes place on a blockchain. This meaning was popularized by Ethereum, whose white paper is subtitled “A Next-Generation Smart Contract and Decentralized Application Platform”. As a result of the attention that Ethereum has received, this meaning has become the dominant one, with banks (and others) working away on smart contract proofs-of-concept. Of course, since we’re talking about regulated financial institutions, this is mostly in the context of private or permissioned blockchains, which have a limited set of identified participants. For reasons that are now well understood, public blockchains, for all of their genius, are not yet suited for enterprise purposes.
So is the future bright for smart contracts in private blockchains? Well, kind of, but not really. You see:
Smart contracts make for slow and clunky blockchains.
If you know about the halting problem and understand how data dependencies prevent concurrency then you may already be convinced. But if not, make yourself a coffee, take a deep breath, and follow me down the rabbit hole…
Understanding smart contracts
In order to understand Ethereum-style smart contracts, we need to start with bitcoin, the first (and still most popular) public blockchain. The bitcoin blockchain was originally designed for one thing only: moving the bitcoin currency from one owner to another. But once it was up and running, people started embedding “metadata” in transactions to serve other purposes, such as digital assets and document notarization. While some bitcoiners fought these applications, an official mechanism for metadata was introduced in March 2014, with usage growing exponentially ever since.
As well as projects built on the bitcoin blockchain, many next-generation public blockchains were developed and launched, including Nxt, Bitshares, Ripple and Stellar. These were designed from the ground up to support a broader range of activities, such as user-created assets, decentralized exchange and collateralized borrowing. Each of these blockchains has a different set of features, as decided upon by its developers, and each must be upgraded by all of its users when a new feature is added. Things started to get rather messy.
Having been involved in some of these projects, Vitalik Buterin posed a simple but brilliant question: Instead of lots of application-specific blockchains, why not have a single public blockchain that can be programmed to do whatever we might want? This über-blockchain would be infinitely extendible, limited only by the imagination of those using it. The world of crypto-enthusiasts was almost unanimously convinced by this powerful idea. And so, with $18 million in crowd funding and to great excitement, Ethereum was born.
Ethereum is a new public blockchain with an associated cryptocurrency called “ether”, like hundreds which came before it. But unlike other blockchains, Ethereum enables anybody to create a “contract” inside the blockchain. A contract is a computer program with an associated miniature database, which can only be modified by the program that owns it. If a blockchain user wants to change a database, they must send a digitally signed message to its contract. The code in the contract examines this message to decide whether and how to react.
Ethereum contracts can be written in one of several new programming languages, such as Solidity and Serpent. Like most programming languages, these are Turing complete, meaning that they can express any general purpose computation. A key feature of Turing complete languages is the loop structure, which performs an operation repeatedly until some condition is fulfilled. For example, a loop might be used to print the numbers from one to a million, without requiring a million lines of code. For the sake of efficiency, programs written for Ethereum are compiled (i.e. converted) into more compact bytecode before being stored on the chain. Ethereum nodes then execute this bytecode within a virtual machine, which is essentially a simulated computer running inside a real one.
When an Ethereum contract is created on the blockchain, it sets up the initial state of its database. Then it stops, waiting politely until it’s called upon. When a user of the blockchain (or another contract) sends it a message in a transaction, the contract leaps into action. Depending on the code within, it can identify the source of the message, trigger other contracts, modify its database and/or send back a response to the caller. All of these steps are performed independently on every node in the network, with identical results.
To give an example, a simple Ethereum subcurrency contract maintains a database of user balances for a particular asset. If it receives a message to transfer funds from Alice to Bob, it will (a) check the message was signed by Alice, (b) check that Alice has sufficient funds, (c) transfer funds from Alice’s to Bob’s account in the database and (d) respond that the operation was successful. Of course, we don’t need Ethereum for that, because a simple bitcoin-style blockchain with native asset support can do the same thing. Ethereum really comes into its own for complex multi-stage business logic, such as crowdfunding, decentralized exchanges, and hierarchical governance structures. Or so, at least, the promise goes.
The trouble with computation
Since they are able to perform any sort of computation, smart contracts are a powerful thing. So what’s the problem? Well, it turns out that computation is unpredictable. However innocent a computer program may look, it can take a long time to run. And sometimes, it goes on running forever. Consider the following classic example (known as an LCG):
- Set x to a single-digit number of your choice
- Set y to 123*x+567
- Set x to the last two digits of y, i.e. y modulo 100
- If x is more than 2 then go back to step 2
- Otherwise stop and output the value of x
Simple enough, right? So here’s a question for you: Will this program ever finish? Or will it get stuck in an infinite loop? Not so sure? Well let me put you out of your misery: It depends on the initial value of x.
If x is 0, 1, 2, 5, 6, 7 or 8, the program stops fairly quickly. But if x is 3, 4 or 9, it continues indefinitely. Don’t believe me? Open up Excel and try for yourself (you’ll need the “MOD” function).
If you couldn’t predict that just by looking at the code, don’t feel too bad. Because not only is this hard for people, it’s impossible for computers. The problem of determining whether a given program will finish executing is called the halting problem. In 1936, Alan Turing, of “Turing complete” and The Imitation Game fame, proved that it cannot be solved for the general case. Barring trivial exceptions, the only way to find out if a program will finish running is to run it for as long as it takes, and that could be forever.
For those of us who’d prefer to live without blue screens of death and spinning beach balls, it’s all rather inconvenient. But live with it we do and, remarkably, most software works smoothly most of the time. And if not, modern operating systems like Windows protect us against runaway code by letting us terminate programs manually. However the same thing can’t be done on a blockchain like Ethereum. If we allowed individual nodes to terminate computations at will, different nodes would have different opinions about the outcome of those computations. In other words, the network consensus would break down. So what’s a blockchain to do?
Ethereum’s answer is based on transaction fees, also known as gas. The sender of each transaction pays for the computations it triggers, and this payment is collected by the miner who confirms it in a block. To be more precise, every Ethereum transaction states up front how much of the sender’s “ether” can be spent on processing it. The fee is gradually spent as the contract executes, step-by-step, within the Ethereum Virtual Machine. If a transaction runs out of fees before it finishes executing, any database changes are reverted and the fee is not returned. If a transaction completes successfully, any remaining fee is returned to its sender. In this way, transactions can only burden the network to the extent that they’re willing to pay for it.
While this is undoubtedly a neat economic solution, it requires a native blockchain currency in order to work. But private blockchains tend not to have a native currency, because their consensus model is based on agreement between a closed set of participants, rather than anonymous proof of work. And in the absence of a native token, transactions cannot pay fees. Instead, in order to prevent runaway computation, we need some kind of fixed limit in terms of computational steps per transaction. This limit would need to be rather high to allow the completion of transactions which intentionally perform a lot of processing. As a result, the network could still end up wasting a lot of energy on unintended loops before finally shutting them down.
Smart contracts vs concurrency
If gas or hard-coded limits can prevent runaway computation, do smart contracts get the green light? Well, not so fast, because there’s another problem with smart contracts that we need to talk about:
Smart contract transactions can only be processed one at a time.
Concurrency is one of the most fundamental issues in computer architecture. A system has good concurrency if it allows several processes to happen simultaneously and in any order. Concurrent systems reduce delays and enable much higher throughput overall, by making optimal use of technologies such as process scheduling, parallel processing and data partitioning. That’s how Google searches 30 trillion web pages almost 100,000 times per second.
In any computer system, a set of transactions can only be processed simultaneously if they don’t depend on, or interfere with, each other. Otherwise, different processing orders might lead to completely different outcomes. Now recall that a smart contract has an associated database, and that it performs general-purpose computation including loops. This means that, in response to a particular message, a smart contract might read or write every single piece of information in its database. For example, if it is managing a sub-currency, it might decide to pay some interest to every holder of that currency. Of course, this won’t always be the case. But the problem is: before running the contract’s program for a particular message, a blockchain node cannot predict which subset of the contract’s database it’s going to use. Nor can it tell whether this subset might have been different under different circumstances. And if one contract can trigger any other, this problem extends to the entire content of every database of every contract. So every transaction must be treated as if it could interfere with every other. In database terms, each transaction requires a global lock.
Now think about the world a blockchain node lives in. Transactions come in from different peers, in no particular order, since there is no centrally managed queue. In addition, at average intervals of between 12 seconds (Ethereum) and 10 minutes (bitcoin), a new block comes in, confirming a set of transactions in a specific order. A node will probably have seen most of a block’s transactions already, but some may be new. Either way, the order of the transactions in the block is unlikely to reflect the order in which they arrived individually. And since the order of transactions might affect the outcome, this means transactions cannot be processed until their order in the blockchain is confirmed.
Now, it’s true that an unconfirmed bitcoin transaction might need to be reversed because of a double spend. But an unconfirmed Ethereum transaction has no predictable outcome at all. Indeed, current implementations of Ethereum don’t even process unconfimed transactions. But if an Ethereum node was to process transactions immediately, it would still need to rewind and replay them in the correct order when a block comes in. This reprocessing is a huge waste of effort, and prevents external processes from concurrently reading the Ethereum database while it goes on. (To be fair, it should be noted that bitcoin’s reference implementation also rewinds and replays transactions when a block comes in, but this is due only to a lack of optimization.)
So what is it about bitcoin’s transaction model that makes out-of-order execution possible? In bitcoin, each transaction explicitly states its relationship with other transactions. It has a set of inputs and outputs, in which each input is connected to the output of a previous transaction which it “spends”. There are no other dependencies to worry about. So long as (a) two bitcoin transactions don’t attempt to spend the same output, and (b) the output of one doesn’t lead to the input of another, a bitcoin node can be sure that the transactions are independent, and it can process them in any order. Their final positions in the blockchain don’t matter at all.
To use formal computer science terminology, Ethereum transactions must be strictly totally ordered, meaning that the relative order between every pair of transactions must be defined. By contrast, bitcoin transactions form a directed acyclic graph which is only partially ordered, meaning that some ambiguity in transaction ordering is allowed. When it comes to concurrency, this makes all the difference in the world.
To look at it in practical terms, there’s been a lot of talk about private blockchains in the enterprise. But a private blockchain is just a distributed database with some additional features. And compared to public blockchains, private chains are far more likely to see the sort of transaction volumes in which concurrency makes a real difference. In any event, if you tried selling an enterprise-class database today that does not support concurrency, you’d be laughed out of the room. Equally ludicrous would be the suggestion that an individual node has to wait 12 seconds before seeing the result of its own transactions. As Vitalik himself recently tweeted:
Key pt about dapp dev that I’ve underappreciated: main prob is not tx cost; ppl can handle $0.001. Prob is latency; ppl want 500ms, not 17s.
— Vitalik Buterin (@VitalikButerin) September 27, 2015
For us at Coin Sciences this is not just an academic issue, because we need to decide whether and how to incorporate smart contracts into MultiChain. Strangely enough, despite the hundreds of feature requests and questions we’ve received so far, only two have been related to smart contracts, and even then in a weaker form than Ethereum provides. So while we’re keeping an open mind, it may turn out that smart contracts don’t solve many real-world problems for our users.
Double decker blockchains
But what if smart contracts do have important uses, even while they make our blockchains slow? What type of blockchain will give us both the performance and flexibility we need? To be honest, I’m still thinking about it. But one answer might be: a blockchain with two tiers.
The lower tier would be built on bitcoin-style transactions which are processed instantly and concurrently, and don’t need to wait for block confirmations. These transactions could perform simple movements of assets, including safe atomic exchanges, without resorting to smart contracts. But this lower tier would also be used as a blind storage layer for the programs and messages that represent more complex business processes, embedded as transaction metadata. Because computation is deterministic, this data in the blockchain is enough to fix the outcome of running that program for its messages, even if that outcome is not immediately known. The lower tier ensures that every participant has an identical view of who did what and when, so participants cannot disagree over the final business outcome.
As for the upper tier, each network participant would choose which programs they want to actually run. Some might choose to run none at all, because they are only interested in simple asset movements. Others might execute a small group of programs that are relevant to their internal processes (with the knowledge that this group exchanges no messages with programs outside). A few might even opt for global execution, processing every message for every program, just like Ethereum. But the key thing would be that every node runs only the code it needs to. In computer science this technique is called lazy evaluation, because it entails doing as little work as possible, without omitting anything crucial. With lazy evaluation, if a smart contract program goes awry, only those nodes which actually execute that program will notice. The network itself won’t feel a thing.
Smart contracts in public blockchains
When it comes to public blockchains like Ethereum, I believe there are good reasons for running smart contracts at the blockchain level. Doing so allows nodes to prevent transaction spam and provide compact data proofs, as explained in an (even!) longer version of this post. But putting that aside, here’s another question: what is the enterprise use case for these chains? Let’s imagine some future time when enterprises have sufficient confidence in public blockchains to use them for real business processes. If a group of companies wants to embed some computational logic in a public blockchain, they have two choices: (1) using an Ethereum-style blockchain which runs the code for them, or (2) using any blockchain as a simple storage layer and executing the code themselves. And given these options, why would they choose (1)?
The rational choice would be the public blockchain with (a) the cheapest price per storage byte, and/or (b) the highest level of security based on total mining power. Since computation is deterministic, the companies need only pay the network to store their contract and messages, not to actually process them. In addition, by using a blockchain for storage only, they can use any programming language they want. This might be Ethereum bytecode, JavaScript/ECMAScript for readability or even machine code for high performance. In fact, Ethereum contracts can already be stored using metadata on the bitcoin blockchain. This is exactly the two-tier approach I suggested.
This discussion is related to the notion of abstraction layers, made famous by the OSI networking model. For optimal reliability and flexibility, each layer of a system should be as abstracted (i.e. independent) from the other layers as possible. For example, we wouldn’t want our hard disk controllers to contain code for rendering JPEG images. So why would we want a blockchain to execute the programs that it stores? For the majority of use cases, we derive no benefit from this, and it comes at a significant cost.
Epilogue
If smart contracts make our blockchains slow, why are they so popular? I think this can be explained, at least in part, by a misunderstanding over what blockchains can do in the real world. You see, public blockchains like bitcoin can directly move a real asset (namely, their native currency), because the blockchain defines the ownership of that currency. This conflates two aspects of assets which are usually distinct: (a) a ledger which records who owns the asset, and (b) its actual physical location. Cryptocurrencies are the ultimate bearer instrument, creating a brave new world or a money launderers’ paradise, depending on who you ask.
But for other assets which exist independently of a blockchain, the only thing a chain can do is hold a record of who they should belong to. This will remain the case until we see the primary issuance of assets onto a blockchain, with legal ownership of that asset defined in terms of the chain’s database. For the institutional finance sector, I believe this day is still a long way off, not least because of the regulatory changes required. Until then, there will always be an extra step, contractual and procedural, between what the blockchain says and what happens in the real world. This step might as well include some Turing complete code, lazily executed at the last possible moment.
The problem is highlighted by the case of “smart bonds” that we’ve heard so much about. A smart bond is directly issued onto a blockchain, with the blockchain ensuring that coupon payments are made to the bond holders at the appropriate times. All well and good. But what happens if the bond issuer has insufficient funds in their blockchain account to cover a payment which is due? The blockchain can certainly set a flag to say that something is amiss, but it can’t do anything else. We still need an army of lawyers and accountants to sort the whole mess out, whether by a haircut, debt restructuring, forfeiture or outright bankruptcy. In short:
If smart contracts can’t deliver their promise, why are we paying their price?
Thank you for reading.
Please post any comments on LinkedIn.