Watermarking Deep Neural Networks

Recently, an IBM team presented at ASIA CCS’18 a framework implementing watermark in a Deep Neural Network (DNN) network. Similarly, to what we do in the multimedia space, if a competitor uses or modifies a watermarked model, it should be possible to extract the watermark from the model to prove the ownership.

In a nutshell, the DNN model is trained with the normal set of data to produce the results that everybody would expect and an additional set of data (the watermarks) that produces an “unexpected” result that is known solely to the owner. To prove the ownership, the owner injects in the allegedly “stolen” model the watermarks and verifies whether the observed result is what it expected.

The authors explored thee techniques in the field of image recognition:

  • Meaningful content: the watermarks are modified images, for instance by adding a consistently visible mark. The training enforces that the presentation of such visible mark results in a given “unrelated” category.
  • Unrelated content: the watermarks are images that are totally unrelated to the task of the model; normally they should be rejected, but the training will enforce a known output for the detection
  • Noisy content: the watermarks are images that embed a consistent shaped noise and produce a given known answer.

The approach is interesting. Some remarks inherited from the multimedia space:

  • The method of creating the watermarks must remain secret. If the attacker guesses the method, for instance that the system uses a given logo, then the attacker may perhaps wash the watermark. The attacker may untrain the model, by supertraining the watermarked model with generated watermarks that will output an answer different from the one expected by the original owner. As the attacker has uncontrolled, unlimited access to the detector, the attacker can fine tune the model until the detection rate is too low.
  • The framework is most probably too expensive to be used for making traitor tracing at a large scale. Nevertheless, I am not sure whether traitor tracing at large scale makes any sense.
  • The method is most probably robust against an oracle attack.
  • Some of the described methods were related to image recognition but could be ported to other tasks.
  • It is possible to embed several successive orthogonal watermarks.

A paper interesting to read as it is probably the beginning of a new field. ML/AI security will be key in the coming years.

Reference

Zhang, Jialong, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph. Stoecklin, Heqing Huang, and Ian Molloy. “Protecting Intellectual Property of Deep Neural Networks with Watermarking.” In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 159–172. ASIACCS ’18. New York, NY, USA: ACM, 2018. https://doi.org/10.1145/3196494.3196550.

Blockchain: Proof of Work

This post is the fourth post in a series dedicated to demystifying blockchains. The third post introduced the major concept of consensus and classified them into two categories: lottery-based and Byzantine Fault agreement-based protocols. This post presents the most known lottery-based consensus: Proof of Work (PoW). Most permissionless blockchains, such as Bitcoin or Ethereum use PoW.

The aim of a PoW algorithm is to validate a block in a distributed environment. The PoW is designed so that ill-behaved actors would need to possess the majority of the system’s computing power to validate a forged block, for instance, for implementing double spending. The PoW is information, so called the nonce in Bitcoin parlance, present inside the block to be validated. The verification of the PoW is simple and fast whereas its construction is difficult to compute. The validator must solve the cryptographic puzzle described in the first picture where

  • Bi is the block to be validated.
  • Target is a value determined by an authority to adjust the difficulty of the challenge. Hash is a cryptographic one-way function. One characteristic of a cryptographic one-way function is collision resistance. Hash is easy to calculate, but given a number y, it is computationally difficult to find x such as Hash(x)=y.

All validators (so-called miners in the Bitcoin ecosystem) try to solve this equation. Currently, for approved cryptographic one-way functions, such as SHA-3, the only known method to solve this equation is brute force attack, i.e., exploring all the possible values systematically. The first miner to solve the equation, i.e., to find a suitable PoW, is elected the winning validator. Furthermore, the winner receives a reward (currently 25 Bitcoins). The parameter Target defines the difficulty of the challenge. Bitcoin defines the minimum number of trailing zeros that the hash must have. This number depends on the expected average solving time and the global mining computation power. In the Bitcoin ecosystem, this value is adjusted every fortnight to have an average validation time of ten minutes. If the global processing power increases too much, the difficulty target adds some zeros. Two interesting consequences:

  • For a given block, there is not one unique PoW as the equation as many solutions. The winner is the first to find one solution. Two validators may find “simultaneously” a solution. The branch validation strategy addresses the corresponding problem. A future post will address branching.
  • The validation time of one block is not fixed. It depends on the “luck” of the miners.

PoW works because it requires a huge calculation power. An ill-behaving actor that would control the majority of computing power would also control the system. The higher the total processing power of an entity is, the larger the probability of this entity to become the validator of a block is. With the rise of mining pools, the assumption that no entity controls a significant calculation power weakens. A mining pool is an association that federates many miners who share the reward and transaction fees. Mining pools unbalance the odds. It is not unrealistic in tiny environments that a mining pool could control more than half of the total processing power. Therefore, PoW may only be suitable for large deployments.

Finding the PoW is, by construction, computing intensive. To gain efficiency mining usually employs either Graphical Processing Units (GPU) or specialized gear based on custom developed ASICs. This calculation implies a lot of energy consumption. There is no sanctioned estimation of the total energy consumption due to mining. Nevertheless, it is commonly accepted that it is huge. Many mining farms are located in regions where energy is cheap.

PoW is surely not green nor fast.

Conclusions

PoW favors the entity that invests the most effort into validation.

PoW is a robust consensus protocol under the assumption that no entity would control more than 50% of the total mining power. At the time of Bitcoin’s creation, this assumption was valid. Currently, six mining pools have each more than 10% of the total mining power (see https://en.bitcoin.it/wiki/Comparison_of_mining_pools). This assumption may become less robust.

PoW is a huge energy consuming protocol.

 

Blockchain: consensus protocols

This post is the third post in a series dedicated to demystifying blockchains. The second post described the difference between permissionless and permissioned blockchains. It also introduced the concept of validator. This post will present the notion of consensus.

A consensus mechanism or protocol ensures that all nodes of the system agree to a shared, approved state. In the framework of a distributed ledger or blockchain, a consensus mechanism enforces that all participants use the same state or version of the ledger. In other words, it is the mechanism that ensures that every entity agreed on the same transactions and that all copies of the ledger are identical. Consensus protocols are not new. They at the core of many distributed systems and mirroring systems. Nevertheless, permissionless blockchains introduced a new challenge. Not all participants to a permissionless blockchain may be behaving properly.

Two kinds of consensus

There are mainly two categories of consensus mechanism:

  • Lottery-based
  • Byzantine Fault agreement-based

The first category is sometimes called the Nakamoto-consensus in honor of the pseudonym of Bitcoin’s founder Nakamoto. The consensus mechanism elects the validator, i.e., the node that decides which is the next block to be appended to the ledger. The election is a lottery draw. The winner is the validator. Each new block requires a new draw. The selection through a lottery reduces the likelihood of an ill-behaving node to validate a forged block. The lottery does not necessarily follow an equiprobable distribution. Each mechanism has its own probability distribution favoring one given characteristics of the winner. Thus, each lottery-consensus has a different trust model. The Proof of Work (PoW), used by Bitcoin, is the most well-known mechanism. There are many other types such as Proof of Stake (PoS), Proof of Space, or Proof of Elapsed Time (PoET). Future posts will explore in details PoW and PoS.

The second category is based on Byzantine Fault Tolerant (BFT) system. BFT systems are designed to operate even if some participants in the protocol are failing. Failure may be involuntary (for instance, a participating node is out of order) or voluntary (for instance, an attacker controls the failing node). BFT employs voting mechanisms to decide the consensus. The used mechanism defines the trust model. It is usually well defined. The Practical Byzantine Fault Tolerant (PBFT) mechanism is the most well-known mechanism. A future post will explore PBFT.

Hybrid consensus mechanisms seem to appear mixing lottery with a pinch of BFT. Casper, the next consensus mechanism of Ethereum is a PoS with some BFT in it.

Lottery or Byzantine Fault Tolerant?

Lottery-based mechanisms are more complex and slower than BFT-mechanisms. Lottery-based consensuses are well fitted for permissionless blockchains. There is no control on the validators. Anybody may participate in the validation. Therefore, the lottery reduces the risk. In a permissioned blockchain, the validators are known. Thus, BFT-consensuses are adequate. Depending on the mechanism, the designer knows how many validators must be compromised or must collude to validate a forged block successfully.

Research on consensus is currently an extremely active field. Unfortunately, many consensus mechanisms are (too) young and their security has not been enough studied. The following chart illustrates relative age of major consensus mechanisms.

Conclusions

If you want to manage your blockchain, then you need to understand the corresponding consensus mechanism. It participates to the trust model of your solution.

 

Blockchain: Permissionless versus Permissioned

This post is the second one in a series dedicated to demystifying blockchains. The first post proposed a definition of blockchain. I intended that the topic of this second post would be consensus. The consensus is the cornerstone of blockchain. While starting to write it, I discovered that I needed first to introduce a fundamental characteristic of blockchain: permission.

Entities decide whether a block is valid and appended to the blockchain. They may be called blockchain nodes or validators. Validators are the pieces of software that determine which is the new block on the chain. In Sato’s vision, everybody could/should be a validator. Thus, his blockchain has no central authority. It is claimed that the blockchain is ruled by everybody (or nobody depending on your point of view). Bitcoin is a permissionless blockchain. This is the case for most cryptocurrencies and many other systems. Ethereum is another example of a permissionless blockchain. In a permissionless blockchain, users delegate their trust to uncontrolled, unknown validators under the assumption that the consensus mechanism does not allow a bad acting validator to cheat.

This delegation of trust is not always possible or desirable. Therefore, there is a second breed of blockchains that operate with a different configuration: permissioned blockchains. The validators are a set of finite known servers. A consortium manages this list following some defined governance rules. You may have noticed that the validators were not necessarily trusted. Depending on the chosen consensus mechanism, the level of expected trust may vary. The open source projects of Hyperledger offer many such permissioned architectures.

Which one is the best?

The advantage of the permissionless blockchain is that there is no (at least claimed) central authority. There is not a single point of failure that may be attacked. This advantage comes at a price: the consensus mechanism is complicated and/or extremely power consuming. It will have to be slow. Furthermore, it requires that the nodes have a robust method to validate a transaction. When managing financial ledgers, it is easy. Checking that Alice currently has the number of tokens she asks to transfer to Bob is straightforward. With more complex transactions, it may be less obvious. Would you trust an unknown validator to check whether your land deed belongs to you and to register it on a land registry blockchain? Or a copyright right? Smart contracts are not the golden answer to that issue.

The advantage of permissioned blockchain is that a set of entities that share a common interest in the fulfillment of the transactions can manage it efficiently. The validators have the authority and implement the complex validation rules that some use cases may be requiring. The consensus mechanisms are simpler and faster than the ones used by permissionless blockchains.

Many “purists” claim that permissionless blockchains are more secure than permissioned ones due to the absence of a central authority, arguing that the management of the validators is a weak point. As usual, the answer is more balanced. It mainly depends on the use cases. Some industrial use cases may benefit from permissioned blockchains. Personally, I would argue that the trust model of a permissioned blockchain can usually be more accurately defined than the trust model of permissionless blockchain. I have not yet read a convincing complete
convincing trust model of a permissionless blockchain.

Conclusion

Thus, a hyper-simplified definition: A permissionless blockchain does not trust nor know its validators whereas a permissioned blockchain knows all its validators but does not need to trust all of them.

Blockchain: A Definition

This post is the first one of a series dedicated to the blockchain. In the coming weeks, I will discuss many aspects of the blockchain. As some of my views may be perceived as pessimistic, a cautionary note is mandatory: I am a skeptical blockchain enthusiast. Blockchain has great potential but also many pitfalls. I hope that these posts will shed some lights on the blockchain.

The first step is to propose a definition for blockchain.

A blockchain is a secure distributed ledger.

Let us examine the four elements of this definition.

  • A blockchain is a ledger. It stores the complete chronological records of transactions. The transactions are combined in a data structure called a block. Each block is cryptographically bound to its predecessor, thus creating a chain of blocks. The blockchain is well suited for transactions and time series. For instance, Bitcoin records the exchange of bitcoins. Other types of information, for instance, graphs, are not necessarily well suited for blockchain. Nevertheless, many
    information can be transcribed in a set of transactions.
  • A blockchain is shared. Many entities use the same ledger. They may not all have the same access rights: Some entities may be allowed to submit transactions to the blockchain whereas other entities may only read these transactions. The use case defines the rules for access control. If the ledger is not to be shared, then probably a traditional database is more suitable than a blockchain.
  • A blockchain is distributed. No central server holds all the blockchain. Every node has the same complete copy of the ledger. And the nodes are connected through a peer-to-peer network. Therefore, the blockchain offers high availability and resilience. There is not one point of failure in the system.
  • A blockchain is secure. Each issuer signs its transactions. Each node validates every transaction according to validation rules that are defined by the blockchain governance. For instance, for a cryptocurrency, the validation of a transaction verifies that Alice owned the coins she transfers to Bob. In the case of a land registry or a supply chain tracking, the validation will be most probably more complicated.
    Once all the transactions of the block validated, the nodes engage in a consensus protocol to decide whether the block is to be appended to the blockchain. Once the consensus reached, all nodes add the new block to their copy of the blockchain. The consensus is the most complex element of the blockchain. Many consensus protocols are available. The most famous one is the Proof of Work (PoW) designed by Nakamoto Sato for Bitcoin. Mining is establishing the consensus for Bitcoin. In a next post, we will study in detail the PoW and other types of consensus. The consensus ensures that every node has the same ledger.
    Transactions are immutable. To alter an already recorded transaction, the attacker must modify the block containing the forged transaction, and also adjust all the subsequent blocks to maintain the cryptographic link. Furthermore, the attacker must trick the consensus protocol to vote the forged fork to be the valid one. Thus, it is reasonable to assume that the transactions are carved in stone. As a side note, immutability may become an issue in case of an error or if the “right to forget” is needed. What is in the blockchain stays in the blockchain.

This first post provides a broad definition of the blockchain. Next posts will explore technical elements of a blockchain.

Reference:

Nakamoto, Satoshi. “Bitcoin: A Peer-to-Peer Electronic Cash System,” 2008. http://www.cryptovest.co.uk/resources/Bitcoin%20paper%20Original.pdf.

Symposium on Foundations and Applications of Blockchain 2018

The University of South California (USC) will host on Friday March 9, 2018 the first Symposium on Foundations and Applications of Blockchain 2018.  Its program is available at https://scfab.github.io/2018/schedule.html.   Note the presence of Leonard Adelman at the discussion panel!  I hope to meet some of you there.

Full disclosure:  I am member of its PC.