7 min read

ProgVRF: A Programmable Fault Tolerance Mechanism for Blockchains & Decentralized Networks

Daniel Hong 31 Aug 2019

Achieving fault tolerance has always been one of the most important issues when building a trustless computing network, ever since the original Bitcoin whitepaper. Because no single party can be fully "trusted" in order to maintain integrity on such a decentralized system, there must be a mechanism in place that allows anyone to verify whether an operation done on the system is valid or not.

Bitcoin came up with a brilliant yet simple solution to this particular problem, in the form of Proof-of-Work (PoW). Based on Adam Back's Hashcash, PoW involved a variable known as a nonce value being included with block data. By competitively trying out countless nonce values across a massive mining pool, only one - or in some cases, very few - mining nodes achieve the block hash value that is above a certain hash threshold (or mining difficulty). The nodes that successfully solved this hash puzzle get rewarded with newly issued Bitcoins.

This mechanism allows for three very important properties: making it incredibly difficult for malicious attackers to compromise the actual block data through a mechanism simply called the longest chain rule (unless there are more than 50% of them compromising the network), controlling the total available supply of Bitcoins being issued, and limiting the number of mining nodes that can participate. Such a simple mechanism can maintain near-perfect network security and an entire economic system, tightly controlling the supply of both the Bitcoins and the available mining power. This is why Bitcoin's Proof-of-Work is often considered to be the perfect consensus algorithm, even after more than 10 years since its announcement.

The downsides with PoW are obviously its very high energy consumption - compared to what it is trying to achieve in the end, and its limited scalability. As the hash threshold - or mining difficulty - increases, more mining compute power is left doing meaningless hash calulations while sucking massive amounts of power. What's worse is that people are now creating special ASICs for this particular purpose, which is now also a massive waste of silicon as well.

Some attempts to solve this problem include ProgPoW (EIP-1057) - which injects random general math operations within the mining process, so that it would be more "ASIC-Resistant" and allow for more commodity GPUs to "decentralize" mining. Even though people technically will be able to create ASICs for this specific algorithm, the performance advantages will be minimal as the mining process now includes general math operations, which is what general-purpose hardware are good at in the first place. But as CoinDesk points out, this approach also seems to be flawed as (i) limiting mining to commodity GPUs only doesn't necessarily mean more decentralization, and (ii) this also introduces additional overhead that makes ProgPoW favor more high-end GPUs - centralizing mining to large mining farms that have the resources to afford them in large quantities. Ethash alone includes enough ASIC-resistance that economically devalidates them in the first place (while it still cannot solve the fundamental issues with PoW).

This fundamental problem with Proof-of-Work led a lot of blockchains - now including Ethereum - to move to a new mechanism known as Proof-of-Stake. Unlike PoW, PoS-based protocols take advantage of the fact that people holding a lot of tokens and have them staked in a contract would probably not commit anything malicious - because then they will be slashed of tokens. This also prevents people without a direct connection to the network from being involved in the consensus process, as parties who have enough tokens to stake some of them will probably won't want the network to go wrong (as it's their own money!)

Obviously this new approach led to concerns on network governance being centralized towards parties with enough funds to actually "stake" anything. Proponents of PoS argue that this is much better than PoW because (i) PoW still allows centralization by only accepting parties that can afford massive mining hardware, and (ii) PoS at least doesn't involve massive energy consumption. Different approaches to the concept of PoS use multiple methods to minimize the impact on governance, but it still stays true that PoS in general cannot provide a meaningful chance for new players to enter the network governance game.

A new blockchain platform led by a group of MIT researchers, called Algorand, is attempting to solve this very problem by introducing the concept of verifiable randomness. I will rely on this concept throughout this article, as the title suggests.

Enter verifiable randomness

The idea for verifiable randomness - and verifiable random functions (VRFs) that enable this concept - isn't exactly new. Dr. Silvio Micali of MIT - who founded Algorand - proposed the concept of verifiable randomness and VRFs back in 1999.

VRFs combine asymmetric encryption and pseudorandom number generation to enable public verification of a generated pseudorandom number, which can prove that the number was indeed generated by the party holding the secret key corresponding to the public key that "unlocks" a given pseudorandom number. More formally, a standard VRF implementation - as described with the original paper with section 3.2 - involves the following:

The function generator \(G\): A probabilistic algorithm that generates a pair of asymmetric keys, \(PK\) and \(SK\), from a security parameter (or a random seed) \(k\). This is described with the Algorand paper as \(Keygen(r) \rightarrow (VK, SK)\).
The function evaluator \(F = (F_1, F_2)\): A deterministic algorithm that accepts a pair of binary strings, \(SK\) (the secret key) and \(x\) (an input message to the VRF) and returns a pair of binary strings, \(v = F_1(SK, x)\) (the generated pseudorandom number) and \(proof = F_2(SK, x)\) (the constructed VRF proof that corresponds to the generated number). This is described with the Algorand paper as \(Evaluate(SK, X) \rightarrow (Y, \rho)\).
The function verifier \(V\): A probabilistic algorithm that accepts four strings - \(PK\), \(x\), \(v\) and \(proof\) - and returns a boolean value stating whether VRF verification with the given parameters has succeeded or not. This is described with the Algorand paper as \(Verify(VK, X, Y, \rho) \rightarrow 0/1\).

One thing to note here is that once the associated proof \(\rho\) is leaked to a third party prior to verification, it is very easy to distinguish the generated pseudorandom output \(Y\) from other values by simply running \(Verify\) - which makes \(Y\) not so "random" anymore. This is a nature of VRFs; thus it is important to keep \(\rho\) hidden from third parties until the verification stage commences. The secret key, \(SK\), should obviously be hidden at all times.

The Algorand blockchain uses VRFs to randomly select users to serve on a committee for a given block \(r\). More specifically, Algorand Consensus uses the above VRF implementation on top of pure Proof-of-Stake consensus in order to randomly select nodes to serve in the committee for a particular block, based on the amount of stake the node (or user) has on the network. Formally put:

For a particular consensus session, users agree on some random number \(Qr\). This is a magic seed available for anyone on the system, which is used to generate the pseudorandom number for each user.
Each user generates their own \(VK\) (public key) and \(PK\) (secret key).
Each user computes \(Evaluate(SK, Qr) \rightarrow (Y, \rho)\).
Check whether the generated pseudorandom number \(Y\) fits within a particular range, which is determined by the amount of tokens the user is staking on the network. If the user wins the lottery, they get to serve as part of the committee for current block \(r\).
When the lottery is complete, anyone can run \(Verify(VK, Qr, Y, \rho)\) to verify the randomness of the numbers generated within the process.

Let's focus on the particular range part here. Because Algorand is a pure Proof-of-Stake based blockchain, the range used with the lottery is determined by the stake of the user holding on the network. This means that the more stake the user holds, the higher the change of the user serving in the committee for a particular block. While this makes sense for a PoS based protocol, this might not be practical for a system that attempts to minimize the role of economic incentives for maintaining network security, such as Gemer.

Looking at this differently: what if we integrate a Proof-of-Work style "difficulty" parameter here, while preserving full programmability for compute power?

Introducing ProgVRF

ProgVRF is essentially a fault tolerance mechanism being included with Gemer for the purpose of controlling compute power supply and ensuring public validity of codexing node selection. Because Gemer by default does not have the concept of tokens, we need another mechanism to effectively control the compute pool without wasting too much resources.

To start, let's define the selection range defined with the original Algorand implementation as selection target range \(T\). Let's also define the total population set of all nodes participating in the lottery as \(K\). Naturally, the probability of a node being selected in this lottery is \(\frac{n(T)}{n(K)}\), assuming that \(n(T) < n(K)\).

Now, let's consider the lottery results of the previous selection round. In this case, we have two known values: the probability computed before the selection round, and the actual proportion of nodes selected after the lottery. Let's define the set of nodes actually selected as \(S\). Thus, the resulting node proportion is \(\frac{n(S)}{n(K)}\), assuming that \(n(S) < n(K)\).

Obviously, because the concept of probability is the theoretical average of all proportions for an infinite number of trials, there would be a difference between the expected probability and the actual proportion. Let's call this \(\Delta p\). Thus, \(\Delta p = \frac{n(T)}{n(K)} - \frac{n(S)}{n(K)}\).

\(\Delta p\) represents the difference between the supply and demand for computation nodes in the previous session. Thus, we need to apply this difference to the selection target range of the current lottery session.

As such, the revised algorithm could be as follows:

For a particular consensus session, users agree on some random number \(Qr\). This is a magic seed available for anyone on the system, which is used to generate the pseudorandom number for each user.
Each user generates their own \(VK\) (public key) and \(PK\) (secret key).
Each user computes \(Evaluate(SK, Qr) \rightarrow (Y, \rho)\).
Check whether the generated pseudorandom number \(Y\) fits within the selection target range \(T\).
On the next lottery session, update the target range - or probability space - \(T\) according to the value of \(\Delta p\).
The selected nodes execute the allocated compute task, and follow the rest of the consensus mechanism as described with Gemer.
When the lottery is complete, anyone can run \(Verify(VK, Qr, Y, \rho)\) to verify the randomness of the numbers generated within the process.

Thus, the updated probability space acts as the hash threshold - or mining difficulty - with Proof of Work, except for the fact that this is fully programmable. I will be updating the algorithm to be more concrete as I work on integrating this into the Gemer main repo.

Conclusion

Even though ProgVRF was primarily designed for Gemer in mind as an extension of existing VRF-based consensus mechanisms, this may be used for general-purpose blockchains as well to improve computation efficiency with consensus. This is still in its early stages; I will be updating the PoC for this algorithm within the Gemer main GitHub repository under the consensus package.

This is still a very rough idea, which is why I didn't bother to write a separate paper on this. Please let me know if you have any other suggesstions to improve the algorithm itself, or if you would like to revise this idea further to incorporate it into your own projects.