In may 2011, French HADOPI mandated an expert, Dadid Znaty, to evaluate the robustness of the system that tracks infringers on P2P. The objectives were:
- Analyze the method used to generate fingerprints
- Analyze the method used to compare sample candidates with these fingerprints
- Analyze the process that collects the IP addresses
- Analyze the workflow
On January 16, 2012, Mr Znaty delivered his report. A version without the annexes was published on HADOPI site for public dissemination. The report concluded that the system was secure.
Conclusion : en l’état, le processus actuel autour du système TMG est FIABLE. Les documents constitués du procès verbal (saisine), et si nécessaire du fichier complet de l’oeuvre (stockée chez TMG) associé au segment de 16Ko constituent une preuve ROBUSTE.
Le mode opératoire utilisé permet donc l’identification sans équivoque d’une oeuvre et de l’adresse IP ayant mis à disposition cette oeuvre.
An approximate translation of this conclusion is
Conclusion: The current process of TMG’s system is RELIABLE. The documents, the minutes, and if necessary the complete opus (stored by TMG) associated to the 16K segment are a ROBUST proof.
The workflow allows unambiguous identification of a piece of content and the IP address that made it available.
Quickly, content owners complained that sensitive information may leak from this report. Therefore, it was interesting to have a look to this report.
The report is not anymore available on the HADOPI site. The links are present, but there is no actual download. Sniffing around, you may easily find copies of the original report (for instance here). Once we have it, what is leaking out?
Most probably for the experts, nothing really interesting. We learn a lot on the process of identification of the right owners of a content. This part is well described in the document. When we look on the technical side, no details. the expert was always answered that the technology providers will not give any details on the algorithms. Therefore, to validate the false positive rate, the expert checks if there is any content inside the reference database that share the same fingerprint. The answer is no (excepted for one case where they fed twice the same master :Pondering: ). Conclusion: no false positive! I let you make your own conclusion.
The annexes that may have some details were not published. I have not found a copy on the net. What bit of information could we grasp:
- There are two technology providers for the fingerprint. They are “anonymized” in the document for confidentiality (sigh! ) We can guess that the audio fingerprint provider is not French as a quote of an answer was in English. This is not a surprise as to the best of my knowledge there is no French technology commercialy available.
- They look for copyrighted content on P2P networks using keywords. Once a content is spotted, its fingerprint is extracted and compared to the master database. If the content fits, its hashcode is recorded (most probably the md5 code). Then, TMG can look for this md5 sample and record the IP address.
- The content is recognized if there is a ordered sequence of fingerprints. The length of the sequence seems to depend of the type of content and the rights owner. For audio, 80% of the duration. For video, in the case of ALPA, 35 minutes…
In conclusion, no a great deal…