Protect your Machine Learning Model with IBM’s Watermark Algorithm

Pranav Dar 07 May, 2019 • 2 min read


  • IBM researchers have developed a technique that embeds watermarks into machine learning models
  • It uses the digital watermarking technique – embedding and detection, to identify stolen models
  • The approach was tested on MNIST and CIFAR10 datasets but does have a few limitations to iron out



Books, documents, images and videos are rightly considered sacred intellectual properties and are appropriately watermarked by the creators. So why shouldn’t the same be done with machine learning models? Businesses spend so much time and effort in building them, so it makes sense to protect them, especially when it comes to commercial applications.

This is the concept IBM has followed in their latest study. Their aim is to help businesses and data scientists protect their work, especially those complex deep learning models. Protected from what, you might be wondering. Well, there are unfortunately too many people who use technology for the wrong purpose, and machine learning has not been spared from that (fake images and videos come to mind).

We recently published an article on how deep learning can be used to fight off adversaries and bolster cyber security, and IBM’s research adds a different perspective to that. IBM has already applied to patent this approach.


So how does this all work?

When you’re applying watermarks to an image or video, there are essentially two stages to it – embedding and detection. In the embedding phase, the developer can overlay the watermark on the image. If it is indeed stolen, the detection stage comes into play. Here, the developer can extract the embedded watermark to prove his/her ownership. This exact idea is used by IBM to protect deep neural networks.

The researchers developed three different algorithms to generate watermarks for these neural networks. As described by them in their blog post:

  • Embedding meaningful content together with the original training data as watermarks into the protected DNNs,
  • Embedding irrelevant data samples as watermarks into the protected DNNs, and
  • Embedding noise as watermarks into the protected DNN

These algorithms were then tested and verified on two popular datasets – MNIST, and CIFAR10. You can read about IBM’s efforts with AI watermarking in their blog post here and their full research paper here.


Our take on this

A curious research by IBM, and certainly one I hadn’t thought of, or read about before. I certainly appreciate the aim with which they have pursued this study, though it remains to be seen how the adversaries and attackers find a way around this as well. Currently, there are a few limitations to this approach. If the model is deployed as an internal service rather than online, this approach will not work.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


Pranav Dar 07 May 2019

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


  • [tta_listen_btn class="listen"]