For shared immutable key-value and time series databases
Today we’re proud to release the latest version of MultiChain, which implements a crucial new set of functionality called “streams”. Streams provide a natural abstraction for blockchain use cases which focus on general data retrieval, timestamping and archiving, rather than the transfer of assets between participants. Streams can be used to implement three different types of databases on a chain:
- A key-value database or document store, in the style of NoSQL.
- A time series database, which focuses on the ordering of entries.
- An identity-driven database where entries are classified according to their author.
These can be considered as the ‘what’, ‘when’ and ‘who’ of a shared database.
Streams basics
Any number of streams can be created in a MultiChain blockchain, and each stream acts as an independent append-only collection of items. Each item in a stream has the following characteristics:
- One or more publishers who have digitally signed that item.
- An optional key for convenient later retrieval.
- Some data, which can range from a small piece of text to many megabytes of raw binary.
- A timestamp, which is taken from the header of the block in which the item is confirmed.
Behind the scenes, each item in a stream is represented by a blockchain transaction, but developers can read and write streams with no awareness of this underlying mechanism. (More advanced users can use raw transactions to write to multiple streams, issue or transfer assets and/or assign permissions in a single atomic transaction.)
Streams integrate with MultiChain’s permissions system in a number of ways. First, streams can only be created by those who have permission to do so, in the same way that assets can only be issued by certain addresses. When a stream is created, it is open or closed. Open streams are writeable by anybody who has permission to send a blockchain transaction, while closed streams are restricted to a changeable list of permitted addresses. In the latter case, each stream has one or more administrators who can change those write permissions over time.
Each blockchain has an optional ‘root’ stream, which is defined in its parameters and exists from the moment the chain is created. This enables a blockchain to be used immediately for storing and retrieving data, without waiting for a stream to be explicitly created.
As I’ve discussed previously, confidentiality is the biggest challenge in a large number of blockchain use cases. This is because each node in a blockchain sees a full copy of the entire chain’s contents. Streams provide a natural way to support encrypted data on a blockchain, as follows:
- One stream is used by participants to distribute their public keys for any public-key cryptography scheme.
- A second stream is used to publish data, where each piece of data is encrypted using symmetric cryptography with a unique key.
- A third stream provides data access. For each participant who should see a piece of data, a stream entry is created which contains that data’s secret key, encrypted using that participant’s public key.
This provides an efficient way to archive data on a blockchain, while making it visible only to certain participants.
Retrieving from streams
The core value of streams is in indexing and retrieval. Each node can choose which streams to subscribe to, with the blockchain guaranteeing that all nodes which subscribe to a particular stream will see the same items within. (A node can also be configured to automatically subscribe to every new stream created.)
If a node is subscribed to a stream, information can be retrieved from that stream in a number of ways:
- Retrieving items from the stream in order.
- Retrieving items with a particular key.
- Retrieving items signed by a particular publisher.
- Listing the keys used in a stream, with item counts for each key.
- Listing the publishers in a stream, with item counts.
As mentioned at the start, these methods of retrieval allow streams to be used for key-value databases, time series databases and identity-driven databases. All retrieval APIs offer start and count parameters, allowing subsections of long lists to be efficiently retrieved (like a LIMIT clause in SQL). Negative values for start allow the most recent items to be retrieved.
Streams can contain multiple items with the same key, and this naturally solves the tension between blockchain immutability and the need to update a database. Each effective database ‘entry’ should be assigned a unique key in your application, with each update to that entry represented by a new stream item with its key. MultiChain’s stream retrieval APIs can then be used to: (a) retrieve the first or last version of a given entry, (b) retrieve a full version history for an entry, (c) retrieve information about multiple entries, including the first and last versions of each.
Note that because of a blockchain’s peer-to-peer architecture, items in a stream may arrive at different nodes in different orders, and MultiChain allows items to be retrieved before they are ‘confirmed’ in a block. As a result, all retrieval APIs offer a choice between global (the default) or local ordering. Global ordering guarantees that, once the chain has reached consensus, all nodes receive the same responses from the same API calls. Local ordering guarantees that, for any particular node, the ordering of a stream’s items will never change between API calls. Each application can make the appropriate choice for its needs.
Streams and the MultiChain roadmap
With the release of streams, we’ve completed the last major piece of work for MultiChain 1.0, and are now firmly on the path to beta. We expect to spend the next few months expanding our internal test suite (already quite large!), finishing the Windows and Mac ports, adding some more useful APIs, updating the Explorer for streams, tweaking aspects of the consensus mechanism, releasing our web demo, and generally tidying up code and help messages. Most importantly, we’ll continue to fix any bugs as soon as they’re discovered, so that our mistakes don’t interrupt your work.
In the longer term, where do streams fit into the MultiChain roadmap? Taking a step back, MultiChain now offers three areas of high-level functionality:
- Permissions to control who can connect, transact, create assets/streams, mine/validate and administrate.
- Assets including issuance, reissuance, transfer, atomic exchange, escrow and destruction.
- Streams with APIs for creating streams, writing, subscribing, indexing and retrieving.
After the release of MultiChain 1.0 (and a premium version), what’s next in this list? If you look at the API command which is used to create streams, you’ll notice an apparently superfluous parameter, with a fixed value of stream
. This parameter will allow MultiChain to support other types of high-level entity in future.
Possible future values for the parameter include evm
(for an Ethereum-compatible virtual machine), sql
(for an SQL-style database) or even wiki
(for collaboratively edited text). Any shared entity whose state is determined by an ordered series of changes is a potential candidate. Each such entity will need: (a) APIs which provide the right abstraction for updating its state, (b) appropriate mechanisms for subscribed nodes to track that state, and (c) APIs for efficiently retrieving part or all of the state. We’re waiting to learn which other high-level entities would be most useful, to be implemented by us or by third parties via a plug-in architecture.
What about smart contracts?
In a general sense, MultiChain takes the approach in which data is embedded immutably in a blockchain, but the code for interpreting that data is in the node or application layer. This is deliberately different from the “smart contracts” paradigm, as exemplified by Ethereum, in which code is embedded in the blockchain and runs in a virtual machine. In theory, because smart contracts are Turing complete, they can reproduce the behavior of MultiChain or any other blockchain platform. In practice, however, Ethereum-style smart contracts have many painful shortcomings:
- Every node has to perform every computation, whether it’s of interest or not. By contrast, in MultiChain each node decides which streams to subscribe to, and can ignore the data contained by others.
- The virtual machine used for smart contracts has drastically worse performance than code which has been natively compiled for a given computer architecture.
- Smart contract code is immutably embedded in a chain, preventing features from being added and bugs from being fixed. This was demonstrated forcefully in the demise of The DAO.
- Transactions sent to a smart contract cannot update a blockchain’s state until their final ordering is known, because of the nature of general purpose computation. This leads to delays (until a transaction is confirmed in a block) as well as possible reversals (in the event of a fork in the chain). By contrast, MultiChain can treat each type of unconfirmed transaction in the appropriate way: (a) incoming assets immediately update a node’s unconfirmed balance, (b) incoming stream items are instantly available, with their global ordering subsequently finalized, (c) permissions changes are applied immediately and then replayed in incoming blocks.
Nonetheless, as I’ve said before, we’re certainly not ruling out smart contracts as a useful paradigm for blockchain applications, if and when we see strong use cases. However, in MultiChain smart contracts would be implemented in a stream-like layer on top of the blockchain, rather than the lowest transaction level. This will preserve MultiChain’s superior performance for simpler blockchain entities like assets and streams, while offering slower on-chain computation where it’s really needed. But there are fewer such cases than you might think.
Please post any comments on LinkedIn.
Technical addendum
All commands related to streams are documented in full in the MultiChain API page, but here is a brief summary:
- Create a stream using
create stream
orcreatefrom ... stream
- Add an item to a stream with
publish
orpublishfrom
- Retrieve a list of streams using
liststreams
- Start or stop tracking a stream with
subscribe
andunsubscribe
- Retrieve stream items using
liststreamitems
,liststreamkeyitems
andliststreampublisheritems
- List stream keys and publishers with
liststreamkeys
andliststreampublishers
- For large stream items, retrieve the full data using
gettxoutdata
(seemaxshowndata
below) - Control per-stream permissions with calls like
grant [address] stream1.write
- View a stream’s permissions using
listpermissions stream1.*
Some other developer notes relating to streams:
- The
create
permission allows an address to create streams. - Relevant per-stream permissions are
write
,admin
andactivate
- New blockchain parameters:
root-stream-name
(leave empty for none),root-stream-open
,anyone-can-create
,admin-consensus-create
,max-std-op-returns-count
- New runtime parameters:
autosubscribe
to automatically subscribe to new streams created andmaxshowndata
to limit the amount of data in API responses (seegettxoutdata
above). - The maximum size of a stream item’s data is fixed by the
max-std-op-return-size
blockchain parameter, as well as the smaller of themaximum-block-size
andmax-std-tx-size
values minus a few hundred bytes. - Nodes using the old wallet format cannot subscribe to streams, and should be upgraded.