Amazon has filed an interesting patent titled System and method for marking content. The idea is rather simple. Create a dictionary of synonyms. To uniquely mark a piece of textual content, permute a set of defined words by selected synonyms. Of course, the patent explores all the alternatives, but in a nutshell this is the main idea.
For the fun, here is the first claim
1. A system, comprising: a processor; and a memory comprising program instructions, wherein the program instructions are executable by the processor to: receive a request for particular content; extract a copy of the requested particular content from a content collection, wherein the particular content includes textual data; substitute a synonym for each of one or more selected words in the textual data of the copy, wherein to substitute a synonym for each of one or more selected words, the program instructions are further executable by the processor to: access a synonym database comprising a plurality of key words, wherein each key word is associated with one or more synonyms in the synonym database; and select a particular synonym to substitute for a particular selected word in the textual data of the copy from one or more synonyms associated with a key word in the database that matches the particular selected word in the textual data of the copy; and return the copy with the substituted synonyms in response to the request.
Does it work? For watermark, there are typically three parameters to examine:
- • Transparency: There are some issues. First of all, it probably is not applicable to literature. Synonyms are rarely perfect and authors may not accept modifications of their text. Nevertheless, for many texts, and for non-purists, it may be rather transparent. Although I’m not sure that there may not be some readable artifacts.
- • Robustness: It is obvious that it is easy to detect some substitutions. If the content is not protected in integrity, it is rather easy to wash or forge a new marked content. If the purpose is to fight piracy (such as illegal redistribution), it will not work. The hacker will remove the integrity protection and substitute.
- • Payload: This depends of the text’s length and the variety of the used vocabulary.
It is an interesting approach although not robust. In some specific contexts, it may have some interest.
Thanks to JJQ for pointing to this patent. :Happy: