Fingerprinting canvas of browser

In 2012, Keaton Mowery and Hovav Shacham proposed a new original method to fingerprint a browser using HTML5: Pixel perfect: Fingerprinting Canvas in HTML5.  It uses a new feature <canvas> of HML5.   <canvas> defines an area of the screen that can be drawn by primitives.   The idea is to write a text, ideally a pangram, into a canvas, to retrieve the rendered bitmap of the canvas area (using command toDataURL) and calculates from this image a digest.   The expectation was that rendering would slightly differ depending on the operating system, the version of the browser, the graphical card and the version of the corresponding driver.   Fingerprinting canvas differentiated users.  Furthermore, all modern browsers support HTML5.

Canvas fingerprinting is transparent to the user.   It bypasses any cookies protection, any private browser mode…  If combined with other fingerprinting parameters such as, for instance, http agent or font detection, the uniqueness of the fingerprint is high.   The site http://www.browserleaks.com/ demonstrates the differentiation.  Do not hesitate to test with your configuration.

This paper was a nice academic study.   This month, Gunes Acar et al. published a paper “The Web never forgets: Persistent tracking mechanisms in the wild.”   They studied different tracking methods used by the top 100,000  web sites (ranking by Alexa).   They discovered that 5.5% of these sites used fingerprinting canvas!  It is mainly used by the “AddThis.com” system.   Furthermore, by reverse engineering the AddThis code, they highlighted that AddThis improved the technique described in the seminal paper.   For instance, the developers used a perfect pangram, or draw two rectangles and checked whether a specific point was part of the path…

User tracking is an arm race and tracking softwares use the latest academic research results.

Note 1:  you can opt out from AddThis at http://www.addthis.com/privacy/opt-out.  they put a cookie on the computer to  signal the opt out  🙁

Note 2: a pangram is a sentence that uses all the letters of the alphabet.  A perfect pangram is a sentence that uses all the letters of the alphabet only once.

 

Facebook would like to listen to what you listen or watch

Last week, Facebook announced a new feature in their status update. If switched on, this feature will identify the songs or TV program that it will identify through the microphone of the mobile device.  It will propose to share this information with your community (and propose a 30 second free sample of the song or a synopsis of the TV program).

Screen Shot 05-26-14 at 05.13 PM

A new example of the use of audio fingerprinting.   By default, the feature is switched off.   Furthermore, the user decides when to share and with whom to share the information.  Thus, in theory, there is no associated privacy issues.   The user remains in control.

Facebook claims that it will not share it if you do not want.   Unfortunately, Facebook does not precise whether it will collect the information for its own profiling even if the user refuses to share it with friends.

As I’m paranoid and as there is no free lunch…     I don’t care as I do not have a Facebook account.  Will you use it?

ReDigi.com the resale locker

indexI must confess that I became aware of this interesting initiative only this summer, although ReDigi operates since October 2011.

ReDigi is a site that allows you either to resell your music songs that you do not want anymore, or purchase music songs that people do not want anymore.  In other words, a second-hand market for music.

How does it work, from the user point of view:

  1. Alice user subscribes to the service
  2. ReDigi locates the songs Alice may resell (either purchase with iTunes, or ReDigi)
  3. Alice selects the songs to sell and reDigi stores them in the cloud while wiping out the copies on the computers
  4. As long as the song is not yet sold, Alice can stream it
  5. Once Bob purchased it, she cannot anymore listen to it.
  6. If ever a copy of the sold song appears again on Alice’s device(s), she is notified.

 

How does it work (partly using the details provided by ReDigi in a court trial, an interview, and my guesses)

  1. She has to install a software called Music Manager
  2. Music Manager explores the directories and spots the iTunes and ReDigi songs.  It most probably directly jumps to the FairPlay protected directory to find the licenses.  It checks if it is legal (in other words if it can access the key, then meaning that it was bound to the device)
  3. It uploads the file (and probably the license) to the cloud and erases the accessible song.  At next sync, all iTunes copies should disappear.
  4. The uploaded copy is marked as such until it is sold
  5. Mark it for somebody else.  I would like to know if they rebuild their own license or a new iTunes license.
  6. During phase 3, it extracts a fingerprint of the song.  Music Manager scouts the hard drive to find copies.  I was not able to find if the fingerprint is a basic crypto hash (md5) or a real audio fingerprint.  If it is the second case, then funny things may happen. 
    Alice purchased Song1 on iTunes.  Later she purchase the full album on a CD.  Thus, she resells the iTunes song1, and rips her CD.  A legit copy of Song1 will reappear on her drive.  Music Manager will complain (ReDigi claims that after numerous complaints that would not be obeyed, i.e., the song is erased, the subscription is cancelled)
    Obviously, if it is just the hash, then the system can be easily bypassed.

 

The interesting question is not if the system can be bypassed.  I am sure that the readers of this blog have already guessed at least one or two ways to hack it.  It is not complex, and I will not elaborate on it.

 

The interesting question is to know if it is legal to resell a digital song.  There is a US first sale doctrine that allows to resell your own goods, nevertheless the answer may perhaps not be so trivial.  See this article.  We will soon have a (first) answer.  On January 2012, Capitol Records filed a suit against ReDigi.  On February 2012, the district court rejected the preliminary injunction.  Oral arguments should start on October 5.  This article gives a good summary of the legal case. 

“Securing Digital Video” is now available!

My book, “Securing Digital Video: Techniques for DRM and Content Protection” is now available on sale.   It can be found directly at Springer (about one week delay), from US amazon (2-4 weeks delay) and from French Amazon (available only in August).

This is the last step of a long process.  I hope that the reader will enjoy it and that it will be useful to the community.   More details on the book are available here.

I would be glad to hear your suggestions, appreciations (even negative ones), and answer any question.  For that, use preferably the address book@eric-diehl.com.  I will always answer.

HADOPI: a little insight view

In may 2011, French HADOPI mandated an expert, Dadid Znaty, to evaluate the robustness of the system that tracks infringers on P2P.  The objectives were:

  1. Analyze the method used to generate fingerprints
  2. Analyze the method used to compare sample candidates with these fingerprints
  3. Analyze the process that collects the IP addresses
  4. Analyze the workflow

On January 16, 2012, Mr Znaty delivered his report.  A version without the annexes was published on HADOPI site for public dissemination. The report concluded that the system was secure.

Conclusion : en l’état, le processus actuel autour du système TMG est FIABLE.  Les documents constitués du procès verbal (saisine), et si nécessaire du fichier complet de l’oeuvre (stockée chez TMG) associé au segment de 16Ko constituent une preuve ROBUSTE.

Le mode opératoire utilisé permet donc l’identification sans équivoque d’une oeuvre et de l’adresse IP ayant mis à disposition cette oeuvre.

An approximate translation of this conclusion is

Conclusion: The current process of TMG’s system is RELIABLE.  The documents, the minutes, and if necessary the complete opus (stored by TMG)  associated to the 16K segment are a ROBUST proof.

The workflow allows unambiguous identification of a piece of content and the IP address that made it available.

Quickly, content owners complained that sensitive information may leak from this report.  Therefore, it was interesting to have a look to this report.

The report is not anymore available on the HADOPI site.  The links are present, but there is no actual download.    Sniffing around, you may easily find copies of the original report (for instance here).   Once we have it, what is leaking out?

Most probably for the experts, nothing really interesting.   We learn a lot on the process of identification of the right owners of a content.  This part is well described in the document.  When we look on the technical side, no details.  the expert was always answered that the technology providers will not give any details on the algorithms.   Therefore, to validate the false positive rate, the expert checks if there is any content inside the reference database that share the same fingerprint.  The answer is no (excepted for one case where they fed twice the same master  :Pondering: ).   Conclusion: no false positive!  I let you make your own conclusion.

The annexes that may have some details were not published.  I have not found a copy on the net.  What bit of information could we grasp:

  • There are two technology providers for the fingerprint.  They are “anonymized” in the document for confidentiality  (sigh! )  We can guess that the audio fingerprint provider is not French as a quote of an answer was in English.  This is not a surprise as to the best of my knowledge there is no French technology commercialy available.
  • They look for copyrighted content on P2P networks using keywords.  Once a content is spotted, its fingerprint is extracted and compared to the master database.  If the content fits, its hashcode is recorded (most probably the md5 code).   Then, TMG can look for this md5 sample and record the IP address.
  • The content is recognized if there is a ordered sequence of fingerprints.   The length of the sequence seems to depend of the type of content and the rights owner.  For audio, 80% of the duration.  For video, in the case of ALPA, 35 minutes…

In conclusion, no a great deal…

 

INA versus YouTube

A French court has condemned YouTube to pay INA 150,000€ to INA because YouTube did not put in place any filtering system that would deter posting INA copyrighted content. INA is the French National Institute of Audiovisual. Its mission is to archive all broadcast content from French TV and radio stations.

Interestingly, INA hopes that YouTube will install an efficient fingerprint system to detect INA’s content. INA has developed its own fingerprinting technology: Signature. YouTube uses its own fingerprint technology: ContentID.

Thanks OC for the pointer

Digital Future Symposium (DFS)

This event organized by the Center for Content Protection was hold with Asia TV at Singapore. Thus, the audience was rather large (140 people) and encompassed broadcasters, producers, and press.
The best presentations were:

  • Brad HUNT (former CTO of MPAA, and now consultant at Digital Media Directions) presented his four major trends in content protection
    • Use of fingerprinting to monetize content
    • Digital copy and managed copy for optical media
    • Domain based DRM
    • DECE with some emphasis on Marlin
  • Fabrice Moscheni (Fastcom) presented an impressive demonstration of DVB-CPCM. The demonstration raised a lot of interest.
  • Yangbin Wang (Vobile) explained how Vobile protected Olympic Games for CCTV

Conax, BayTSP, Verimatrix, Microsoft and Viaccess presented their products. Intertrust made a dull presentation of Marlin. I made two presentations:

  • A global approach of security explaining that using only fingerprint or watermark is insufficient, at least for tightly controlled distribution. The distinction between tightly controlled distribution and loosely controlled distribution was appreciated.
  • An introduction to DVB-CPCM before Fastcom’s demonstration.

Two main messages were conveyed during this symposium. Content Identification Techniques may allow monetization of content. Domain is the next paradigm in DRM.