Deep Learning: A Critical Appraisal (paper review)

Deep learning is becoming extremely popular. It is one of the fields of Machine Learning that is the most explored and exploited. AlphaGo, Natural Language Processing, image recognition, and many more topics are iconic examples of the success of deep learning. It is so successful that it seems to become the golden answer to all our problems.

Gary Marcus, a respected ML/AI researcher, published an excellent critical appraisal of this technique. For instance, he listed ten challenges that deep learning faces. He concludes that deep learning is only one of the tools needed and not necessarily a silver bullet for all problems.

From the security point of view, here are the challenges that seem relevant:

“Deep Learning thus far works well as an approximation, but its answers often cannot be fully trusted.”

Indeed, the approach is probabilistic rather than heuristic. Thus, we must be cautious. Currently, the systems are too easily fooled. This blog reported several such attacks. The Generative Adversarial Networks are promising attack tools.

“Deep learning presumes a largely stable world, in ways that may be problematic.”

Stability is not necessarily the prime characteristics of our environments.

“Deep learning thus far cannot inherently distinguish causation from correlation.”

This challenge is not related to security. Nevertheless, it is imperative to understand it. Deep learning detects a correlation. Too often, people assume that there is causation when seeing the correlation. This assertion is often false. Causation may be real if the parameters are independent. If they are linked/triggered by an undisclosed parameter, it is instead this undisclosed parameter that produces the causation.

In any case, this paper is fascinating to read to keep an open, sane view of this field.

Marcus, Gary. “Deep Learning: A Critical Appraisal.” ArXiv:1801.00631 [Cs, Stat], January 2, 2018.



Blockchain: a “supply chain” attack

A hacker, by the handle Right9ctrl, has injected malicious code that steals Bitcoin and Bitcoin Cash funds stored inside BitPay’s Copay wallet apps. The hacker injected the malicious code in the JavaScript library Event_Stream. This malicious code lays dormant until used inside the source code of Copay, a Bitcoin wallet. There, it steals much information such as the private key.

Right9ctrl inherited the access to the Event_Stream library because the initial author passed him/her the control. The initial author did not want to maintain anymore the open source library.

In other words, this is an example of a software supply chain attack. One element in the supply chain (here a library) has been compromised. Such an attack is not a surprise. Nevertheless, it raises a question about the security of open source components.

Many years ago, the motto was “Open source is more secure than proprietary solutions.” The primary rationale was that many eyes reviewed the code and we all know that code review is key for secure software. In the early days of open source, this motto may have been mostly true, under some specific trust models ( see, Chapter 12 of Securing Digital Video…). Is it still true in our days?

Currently, there are a plethora of available open source libraries. With the advent of many new programming languages, the number of libraries is exploding. Many projects have a few contributors, sometimes even only one individual. How many users of these libraries review the code before using it? I would guess very very few. Thus, the motto is becoming weaker. It is probably true for large, well-established projects, such as OpenSSL, but what for projects with one contributor? The success of the library is most probably not a valid indicator of trustfulness.

In this case, a warning signal would have been that the malicious code was heavily obfuscated. There is no legitimate reason why a JavaScript open source should be obfuscated. Open source is about transparency. Obfuscation is about obscurity.

Conclusion: be very careful when selecting an open source library, you may not necessarily trust it.

Conclusion2: if you are using a copy wallet, Sophos provides some advices.


Blockchain: why have permissionless blockchains to be slow?

This post is the seventh post in a series dedicated to demystifying blockchains. The previous post introduced the permissioned blockchains based on Byzantine Fault Tolerant (BFT) protocols.  BFT-consensus is faster than lottery-based consensus.  This post attempts to explain why lottery-based consensuses are inherently slow.

First, we need to explain the concept of a temporary fork.  As the consensus is a lottery, sometimes two nodes win at the same time.  The consequence is that at a given time there are two different instances of the blockchain, so-called forks.  Depending on the spreading speed of the information of the wining block and the number of validation nodes, the two forks may grow independently.  The consensus algorithm decides that the longest fork wins.  The shorter fork is abandoned.  The transactions of the “abandoned” fork that were not approved in the winning fork are returned to the pool of “not yet approved” transactions.  This strategy is the reason why Bitcoin approves a transaction finally only once there are at least five following blocks in the chain.

Temporary forks are inherent to lottery-based consensus protocols.  BFT-based consensus protocols do not have temporary forks because their consensus is deterministic.  All the faithful nodes take the same decision.

For the sake of completion, there are two other types of fork: hard forks and soft forks.  They are due to the evolution of the protocol and are independent of the consensus mechanism.  Thus, they are out of the scope of this post.  We will come back to these forks in a future post.

Let us use Bitcoin to illustrate the issue.  Bitcoin validates a block every ten minutes (average time).  The difficulty of the Proof of Work drives this duration.  (see Blockchain: Proof of Work).  As a block holds a maximum number of transactions, this average time of validation defines the blockchain’s throughput.  Currently, Bitcoin’s capacity is about seven transactions per second.

How to increase this throughput?  An “obvious” solution seems to decrease the average validation time by reducing the difficulty of the cryptographic puzzle.  For instance, should the average duration be halved, then the throughput would double, isn’t it?  Unfortunately, it is not so simple.  If we reduce the difficulty of the challenge at constant global calculation power, the duration is reduced.  Another consequence is that the probability to solve the puzzle rises.  The easier the challenge, the higher the likelihood to solve it.  Thus, the probability to create a temporary fork increases also.  The shorter the average validation time, the more temporary forks.  Should the duration decrease drastically, the system would end up with a pandemonium of temporary forks.  Unfortunately, abandoned temporary forks inject back many “seemingly-approved” transactions in the pool of “not-yet-approved” transactions.  In other words, the global throughput is decreasing, and the predictability of when a transaction may be approved finally is impacted.  Of course, Sato understood this crucial aspect of his protocol and chose ten minutes rather than one minute.

Conclusion: Lottery-based consensus protocols need to have “long” approval duration to avoid the proliferation of temporary forks.

Blockchain: Consensus based on Byzantine Fault Tolerance

This post is the sixth post in a series dedicated to demystifying blockchains. The last post introduced the Proof of Stake. The third post introduced the concept of consensus and classified them into two categories: lottery-based and Byzantine Fault agreement-based protocols. This post presents the most known Byzantine Fault agreement-based consensus: Proof of Work

Byzantine Fault Tolerance is known for about four decades. A Byzantine Fault is any failure of a system. The failure may be that the system is not working, or the system is behaving maliciously. An f-Byzantine Fault-Tolerant network is a network that operates correctly under the assumption that f nodes may be concurrently experiencing Byzantine faults. The nodes are either well-behaving or ill-behaving, i.e., suffering from Byzantine Faults. The network comprises n nodes with a maximum of f ill-behaving nodes.

The best implementation of Byzantine Fault Tolerant system for asynchronous protocols is the Practical Byzantine Fault Tolerance (PBFT).  PBFT assumes that the size of the network is at least 3 f + 1.

At any given moment, one node is the primary and the other nodes are replicas. The simplified algorithm operates as follows:

  • A client sends a request to the primary node for an operation
  • The primary node broadcasts this request to all replicas
  • Replicas perform the requested operation and return an answer to the client
  • The client waits for at least replicas answering the same result. This is the consensus outcome of the operation.

The figure illustrates the basic behavior for the case of a 1-node fault tolerant system. It has at least four nodes. The client sends its request to the primary node. The primary node broadcasts the request of operation to the three other nodes: replica 1, replica 2 and replica 3. As soon as the client receives an affirmative answer from two replicas (in the figure, replica 1 and replica 3), it considers the operation successful. If one replica is failing, the client will still receive proper answers from the two other replicas. If the primary node is failing, then the protocol switches the role of primary to one replica.

The PBFT is well adapted for fixed networks with known trusted entities. As such, it may be useful for blockchain validated by a set of known, approved authorities, i.e., permissioned blockchains. The number of authorities defines the system’s resilience. The minimal size is four which allows for one ill-behaving authority. Each authority has the same level of trust and shares trust with the other authorities.

PBFT is straightforward and robust. The theoretical background is well studied, known for a long time and used by the community. The trust is not linked to the ownership of resources or stakes. It is fast and computationally efficient. There is no notion of mining. It is deterministic. A permissioned blockchain using PBFT can have a very high through output of transactions.

Unfortunately, the trust is not flexible. PBFT requires a pre-established list of collaborating trusted participants. Any new participant requires new settings. This adjustment cannot be dynamic. The consensus of permissionless blockchains cannot employ PBFT. By participants, we mean the validation nodes and not the users of the blockchain. The system does not need to trust the users. It is the role of the validation and consensus to screen bad transactions.

A known weakness is that the system is prone to Sybil attacks. An entity may attempt to register several participants that it controls stealthily. If the cheating entity controls at least f sibyls, then it controls the full system. Therefore, the initial trust in the participants is paramount. In most permissioned blockchains, the likelihood of Sybil attacks is negligible.

Conclusion: PBFT is a well-known, well-studied protocol. For decades, distributed systems have taken their decisions using it. It is fast and computationally efficient. PBFT is an interesting consensus protocol for permissioned blockchain. Some variants of BFT are also available.


M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, USA, 1999

Blockchain: Proof of Stake

This post is the fifth post in a series dedicated to demystifying blockchains. The fourth post introduced the most famous lottery-based consensus: Proof of Work (PoW). This post presents another type of lottery-based consensus: Proof of Stake (PoS).

With PoW, the entity that has the highest calculation power has the highest probability to be the elected validator of the block. With PoS, the entity that has the highest stake in the system has the highest probability to become the elected validator of the block. Each potential validator, so called forger, puts some collaterals in escrow. Would the elected validator cheat, then thisvalidator would loose the escrowed collateral. Obviously, the collaterals cannot exceed the validator’s owned wealth.

The security assumption is that the more stakes you have in a system, the more you are expected to defend this system in order not to lose your stake. Thus, the underlying hypothesis is that the highest stakeholders are acting rationally. Furthermore, like with PoW, there is an incentive to forge. The forger who validated a block is rewarded with some coins (and the transaction fees).

PPCoin implemented the first PoS. It uses the concept of coin age. The coin age is the number of coins multiplied by the holding period of these coins. If Alice holds ten coins for 20 days, then her coin age is 200 The collateral is a number of coin ages bidded for validating the block. Indeed, Alice bids n coin ages by paying herself these n coins. The forger must solve a kind of PoW, but the challenge is inversely proportional to the collateral, i.e., the space of search decreases when the collateral increases. Thus, the winning forger is most probably the forger with the highest collateral.

Buterin Vitalik, the founder of Ethereum, introduced Casper: a PoS based on Byzantine Fault Tolerance. Casper has a set of volunteer validators. Each validator commits a number of ethers as a deposit. Casper defines an epoch as a set of 100 consecutive blocks. Roughly, Casper validates an epoch once a number of validators whose aggregated deposit represents more than 2/3 of the total deposit validated the epoch. The assumption is that if less than 1/3 of the validators cheated, they would be detected and identified. In that case, the community would remove their escrowed collateral through a hard fork.

The main advantage of PoS is that it consumes far less computing power than a PoW-based system. Usually, PoS has also less latency than PoW. The main latency is for the minimal time that collaterals are escrowed.

According to me, the trust model of PoS is weaker than the trust model of PoW. An ill-behaving majority stakeholder may attempt to use her advantage to take an even more significant stake. Furthermore, the stakeholders are expected to act rationally. This assumption may be weak in case of a state-funded attack which may have objectives other than monetary ones.

Even with a rationale player, there may be some weakness. If the gain of a misapproved transaction exceeds the loss of the escrowed collaterals, it may be a rationale decision to cheat. For instance, PoS is prone to “Nothing at Stake” attack. Alice may have a significant stake at time T0. She cashes all her stake at time T1. At T1, she has no more stake but has gained some benefits. As she has nothing to lose anymore, she may attempt to attack the blockchain at T0 when she had the majority of the stake. She attempts to start a new branch from T0, using all her then available-stake as collateral. Having a higher stake, she has good chance to win and thus begin a new branch where she would own again the already spent stakes. Would she not win, she would have lost nothing. Of course, T0 and T1 must be close.


PoS is faster and less power consuming than PoW. As such, this type of consensus may overcome some of the issues of PoW. Nevertheless, its trust model seems weaker. I foresee that we may see new PoS protocols in the coming years.

Watermarking Deep Neural Networks

Recently, an IBM team presented at ASIA CCS’18 a framework implementing watermark in a Deep Neural Network (DNN) network. Similarly, to what we do in the multimedia space, if a competitor uses or modifies a watermarked model, it should be possible to extract the watermark from the model to prove the ownership.

In a nutshell, the DNN model is trained with the normal set of data to produce the results that everybody would expect and an additional set of data (the watermarks) that produces an “unexpected” result that is known solely to the owner. To prove the ownership, the owner injects in the allegedly “stolen” model the watermarks and verifies whether the observed result is what it expected.

The authors explored thee techniques in the field of image recognition:

  • Meaningful content: the watermarks are modified images, for instance by adding a consistently visible mark. The training enforces that the presentation of such visible mark results in a given “unrelated” category.
  • Unrelated content: the watermarks are images that are totally unrelated to the task of the model; normally they should be rejected, but the training will enforce a known output for the detection
  • Noisy content: the watermarks are images that embed a consistent shaped noise and produce a given known answer.

The approach is interesting. Some remarks inherited from the multimedia space:

  • The method of creating the watermarks must remain secret. If the attacker guesses the method, for instance that the system uses a given logo, then the attacker may perhaps wash the watermark. The attacker may untrain the model, by supertraining the watermarked model with generated watermarks that will output an answer different from the one expected by the original owner. As the attacker has uncontrolled, unlimited access to the detector, the attacker can fine tune the model until the detection rate is too low.
  • The framework is most probably too expensive to be used for making traitor tracing at a large scale. Nevertheless, I am not sure whether traitor tracing at large scale makes any sense.
  • The method is most probably robust against an oracle attack.
  • Some of the described methods were related to image recognition but could be ported to other tasks.
  • It is possible to embed several successive orthogonal watermarks.

A paper interesting to read as it is probably the beginning of a new field. ML/AI security will be key in the coming years.


Zhang, Jialong, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph. Stoecklin, Heqing Huang, and Ian Molloy. “Protecting Intellectual Property of Deep Neural Networks with Watermarking.” In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 159–172. ASIACCS ’18. New York, NY, USA: ACM, 2018.