Invisible Image Watermarks Are Provably Removable Using Generative AI

Generative AI is the current hot topic. Of course, one of the newest challenges is to discriminate a genuine image from a generative-AI-produced one. Many papers propose systematically watermarking the generative AI outputs.

This approach makes several assumptions. The first one is that the generator is actually adding an invisible watermark. The second assumption is that the watermark survives most transformations.

In the content protection field, we know about the validity of the second assumption. Zhao et al., from the University of California Santa Barbara and Carnegie Mellon University, published a paper. The system adds Gaussian noise to the watermarked image and reconstructs the same image using the noise image. After several iterations, the watermark disappears. They conclude that any watermark can be defeated.

This is a well known fact in the watermark community. The Break Our Watermark System (BOWS) in 2006 and the BOWS2 in 2010 demonstrated this reality. These contests aimed to demonstrate that attackers can defeat the watermark if they have access to an oracle watermark detector.

Thus, this paper illustrates this fact. Their contribution adds generative AI to the attacker’s toolset. As a countermeasure, they propose to use a semantic watermark. The semantic watermark changes the image but keeps its semantic information (or at least some). This approach is clearly not usable for content protection.


Zhao, Xuandong, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, and Lei Li. “Invisible Image Watermarks Are Provably Removable Using Generative AI.” arXiv, August 6, 2023.

Craver, Scott, Idris Atakli, and Jun Yu. “How We Broke the BOWS Watermark.” In Proceedings of the SPIE, 6505:46. San Jose, CA, USA: SPIE, 2007.

“BOWS2 Break Our Watermarking System 2nd Ed.”

Black Hat 2023 Day 2

  1. Keynote: Acting national cyber director  discusses the national cybersecurity strategy  and workforce efforts (K. WALDEN)

A new team at the White House of about 100 people dedicated to this task. No comment

The people ḍeciding which features require security reviews are not security experts. Can AI help?

The first issue is that engineering language is different than the normal language.   There is a lot of jargon and acronyms.  Thus, standard LLM may fail.

They explored several strategies of ML.

They used unsupervised training to define vector size (300 dimensions).  Then, they used convolution network with these vectors to make their decision.

The presentation is a good high-level introduction to basic techniques and the journey.

Missed 2% and false 5%.

The standard does not forbid JWE and JWS with asymmetric keys.  By changing the header, it was able to confuse the default behavior.

The second attack uses applications that use two different libraries, crypto and claims.  Each library handles different JSON parsing.   It is then possible to create inconsistency.

The third attack is a DOS by putting the PBKDF2 iteration value extremely high.

My Conclusion

As a developer, ensure at the validation the use of limited known algorithms and parameters.

ChatGPT demonstrates the vulnerability of humans to being bad at testing

When demonstrating a model, are we sure they are not using trained data as input to the demonstration.  This trick ensures PREDICTABILITY.

Train yourself in ML as you will need it.

Very manual methodology using traditional reverse engineering techniques

Laion5B is THE dataset of 5T images.   It is a list of URLs.  But registered domains expire and can be bought.  Thus, they may be poisoned.  It is not a targeted attack, as the attacker does not control who uses it.

0.01% may be sufficient to poison.

It shows the risk of untrusted Internet data.  Curated data may be untrustworthy.

The attack is to use Java polymorphism to override the normal deserialization.  The purpose is to detect this chain.

Their approach uses tainted data analysis and then fuzz.

Black Hat 2023 Day 1

  1. Introduction (J. MOSS)

Jeff MOSS (Founder of DefCon and Black Hat) highlighted some points:

  • AI is about using predictions. 
  • AI brings new issues with Intellectual Properties.   He cited the example of Zoom™ that just decided that all our interactions could be used for their ML training.
  • Need for authentic data.

The current ML models are insecure, but people trust them.  Labs had LLMs available for many years but kept them.  With OpenAI going public, it started the race.

She presents trends for enterprise:

  • Enterprise’s answer to ChatGPT is Machine Learning as a Service (MLaaS).  But these services are not secure.
  • The next generation should be multi-modal models (using audio, image, video, text…).  More potent than monomodal ones such as LLMs.
  • Autonomous agent mixes the data collection of LLM and takes decisions and actions.  These models will need secure authorized access to enterprise data.  Unfortunately, their actions are non-deterministic.
  • Data security for training is critical.  It is even more challenging when using real-time data.

She pointed to an interesting paper about poisoning multi-modal data via image or sound.

Often, the power LED is more or less at the entry of the power supply circuit.  Thus, intensity is correlated to the consumption.

They recorded only the image of the LED to see the effect of the rolling shutter.  Thus, they increase the sampling rate on the LED with the same video frequency.  This is a clever, “cheap” trick.

To attack ECDSA, they used the Minerva attack (2020)

Conclusion: They turned timing attacks into a power attack.  The attacks need two conditions:

  1. The implementation must be prone to some side-channel timing attack.
  2. The target must have a power LED in a simple setting, such as a smart card reader, or USB speakers. 

Despite these limitations, it is clever.

Once more, users trust AI blindly.

The global environment is complex and extends further than ML code.

All traditional security issues are still present, such as dependency injection.

The current systems are not secure against adversarial examples.  They may not even present the same robustness of all data points.

Explainability is insufficient if it is not trustworthy.  Furthermore, the fairness and trustworthiness of the entity using the explanation are essential.

The Multi-Party Computation (MPC) Lindel17 specifies that all further interactions should be blocked when a finalized signature fails.  In other words, the wallet should be blocked.  They found a way to exfiltrate the part key if the wallet is not blocked (it was the case for several wallets)

In the case of GG18 and GG20, they gained the full key by zeroing the ZKP using the CRT (Chinese Remainder Theorem) and choosing a small factor prime.

Conclusion: Adding ZKP in protocols to ensure that some design hypotheses are enforced.

They created H26forge to create vulnerable H264 content.  They attack the semantics out of its specified range.  Decoders may not test all of them.  The tool helps with handling the creation of forged H264. 


This may be devastating if combined with fuzzing.

Enforce the limits in the code.

If the EKU (extended key use) is not properly verified for its purpose, bingo.

Some tested implementations failed the verification.  The speaker forged the signing tools to accept domain-validated certificates for signing code.

Politically correct but not really informative.

Deep Learning: A Critical Appraisal (paper review)

Deep learning is becoming extremely popular. It is one of the fields of Machine Learning that is the most explored and exploited. AlphaGo, Natural Language Processing, image recognition, and many more topics are iconic examples of the success of deep learning. It is so successful that it seems to become the golden answer to all our problems.

Gary Marcus, a respected ML/AI researcher, published an excellent critical appraisal of this technique. For instance, he listed ten challenges that deep learning faces. He concludes that deep learning is only one of the tools needed and not necessarily a silver bullet for all problems.

From the security point of view, here are the challenges that seem relevant:

“Deep Learning thus far works well as an approximation, but its answers often cannot be fully trusted.”

Indeed, the approach is probabilistic rather than heuristic. Thus, we must be cautious. Currently, the systems are too easily fooled. This blog reported several such attacks. The Generative Adversarial Networks are promising attack tools.

“Deep learning presumes a largely stable world, in ways that may be problematic.”

Stability is not necessarily the prime characteristics of our environments.

“Deep learning thus far cannot inherently distinguish causation from correlation.”

This challenge is not related to security. Nevertheless, it is imperative to understand it. Deep learning detects a correlation. Too often, people assume that there is causation when seeing the correlation. This assertion is often false. Causation may be real if the parameters are independent. If they are linked/triggered by an undisclosed parameter, it is instead this undisclosed parameter that produces the causation.

In any case, this paper is fascinating to read to keep an open, sane view of this field.

Marcus, Gary. “Deep Learning: A Critical Appraisal.” ArXiv:1801.00631 [Cs, Stat], January 2, 2018.



Watermarking Deep Neural Networks

Recently, an IBM team presented at ASIA CCS’18 a framework implementing watermark in a Deep Neural Network (DNN) network. Similarly, to what we do in the multimedia space, if a competitor uses or modifies a watermarked model, it should be possible to extract the watermark from the model to prove the ownership.

In a nutshell, the DNN model is trained with the normal set of data to produce the results that everybody would expect and an additional set of data (the watermarks) that produces an “unexpected” result that is known solely to the owner. To prove the ownership, the owner injects in the allegedly “stolen” model the watermarks and verifies whether the observed result is what it expected.

The authors explored thee techniques in the field of image recognition:

  • Meaningful content: the watermarks are modified images, for instance by adding a consistently visible mark. The training enforces that the presentation of such visible mark results in a given “unrelated” category.
  • Unrelated content: the watermarks are images that are totally unrelated to the task of the model; normally they should be rejected, but the training will enforce a known output for the detection
  • Noisy content: the watermarks are images that embed a consistent shaped noise and produce a given known answer.

The approach is interesting. Some remarks inherited from the multimedia space:

  • The method of creating the watermarks must remain secret. If the attacker guesses the method, for instance that the system uses a given logo, then the attacker may perhaps wash the watermark. The attacker may untrain the model, by supertraining the watermarked model with generated watermarks that will output an answer different from the one expected by the original owner. As the attacker has uncontrolled, unlimited access to the detector, the attacker can fine tune the model until the detection rate is too low.
  • The framework is most probably too expensive to be used for making traitor tracing at a large scale. Nevertheless, I am not sure whether traitor tracing at large scale makes any sense.
  • The method is most probably robust against an oracle attack.
  • Some of the described methods were related to image recognition but could be ported to other tasks.
  • It is possible to embed several successive orthogonal watermarks.

A paper interesting to read as it is probably the beginning of a new field. ML/AI security will be key in the coming years.


Zhang, Jialong, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph. Stoecklin, Heqing Huang, and Ian Molloy. “Protecting Intellectual Property of Deep Neural Networks with Watermarking.” In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 159–172. ASIACCS ’18. New York, NY, USA: ACM, 2018.

DolphinAttack or How To Stealthily Fool Voice Assistants

Six researchers from the Zhejiang University published an excellent paper describing DolphinAttack: a new attack against voice-based assistants such as Siri or Alexa. As usual, the objective is to force the assistant to accept a command that the owner of the assistant did not issue. The attack is more powerful if the owner does not detect its occurrence (excepted, of course, the potential consequences of the accepted command). The owner should not hear a recognizable command or even better hear nothing.

Many attacks try to fool the Speech Recognition system by finding characteristics that may fool the machine learning system that powers the recognition without using actual phonemes. The proposed approach is different. The objective is to fool the audio capturing system rather than the speech recognition.

Humans do not hear ultrasounds, i.e., frequencies greater than 20 kHz. Speech is usually in the range of a few 100 HZ up to 5 kHz. The researchers’ great idea is to exploit the characteristics of the acquisition system.

  1. The acquisition system is a microphone, an amplifier, a low-pass filter (LPF), and an analog to digital converter (ADC), regardless of the Speech Recognition system in use. The LPF filters out the frequencies over 20 kHz and the ADC samples at 44.1 kHz.
  2. Any electronic system creates harmonics due to non-linearity. Thus, if you modulate a signal of fm
    with a carrier at fc, in the Fourier domain, many harmonics will appear such as fC – fm, fC + fm¸ and
    fC as well as their multiples.

You may have guessed the trick. If the attacker modulates the command (fm) with an ultrasound carrier fc, then the resulting signal is inaudible. However, the LPF will remove the carrier frequency before sending it to the ADC. The residual command will be present in the filtered signal and may be understood by the speech recognition system. Of course, the commands are more complicated than a mono-frequency, but the system stays valid.

They modulated the amplitude of a frequency carrier with a vocal command. The carrier was in the range 20 kHz to 25 kHz. They experimented with many hardware and speech recognition. As we may guess, the system is highly hardware dependent. There is an optimal frequency carrier that is device dependent (due to various microphones). Nevertheless, with the right parameters for a given device, they seemed to have fooled most devices. Of course, the optimal equipment requires an ultrasound speaker and adapted amplifier. Usually, speakers have a response curve that cut before 20 kHz.

I love this attack because it thinks out of the box and exploits “characteristics” of the hardware. It is also a good illustration of Law N°6: Security is not stronger than its weakest link.

A good paper to read.


Zhang, Guoming, Chen Yan, Xiaoyu Ji, Taimin Zhang, Tianchen Zhang, and Wenyuan Xu. “DolphinAttack: Inaudible Voice Commands.” In ArXiv:1708.09537 [Cs], 103–17. Dallas, Texas, USA: ACM, 2017.


Picture by http: //



Subliminal images deceive Machine Learning video labeling

Current machine learning systems are not designed to defend against malevolent adversaries. Some teams succeeded already to fool image recognition or voice recognition. Three researchers from the University of Washington fooled the Google’s video tagging system. Google offers a Cloud Video Intelligence API that returns the most relevant tags of a video sequence. It is based on deep learning.

The idea of the researchers is simple. They insert the same image every 50 frames in the video. The API returns the tags related to the added image (with an over 90% high confidence) rather than the tags related to the forged video sequence. Thus, rather than returning tiger for the video (98% of the video time), the API returns Audi.

It has never been demonstrated that subliminal images are effective on people. This team demonstrated that subliminal images can be effective on Machine Learning. This attack has many interesting uses.

This short paper is interesting to read. Testing this attack on other APIs would be interesting.

Hossein, Hosseini, Baicen Xiao, and Radha Poovendran. “Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos.” arXiv, March 31, 2017.