|The Fuzzy Hashing Patent
||[May. 15th, 2008|06:40 am]
It appears that somebody has patented fuzzy hashing. Specifically, US Patent 7,272,602, System and method for unorchestrated determination of data sequences using "sticky byte” factoring to determine breakpoints in digital sequences, was issued to Gregory Hagan Moulton of the EMC Corporation on 18 Sep 2007.
When I published my fuzzy hashing paper, I expected (and hoped) other researchers would improve the algorithm. A search this morning revealed two papers that appear to do so, An Efficient Piecewise Hashing Method for Computer Forensics and Improving Disk Sector Integrity Using 3-dimension Hashing Scheme. Excellent! I can't wait to read these.
But the same search also revealed the patent. Submitted in 2004, the patent examiner apparently cited my fuzzy hashing paper published in 2006. Please don't ever let anybody tell you I "invented" fuzzy hashing. I had the idea of using the existing (bad) spam detector, spamsum, for computer forensics. I combined the existing spamsum engine with the md5deep interface to create ssdeep and wrote the paper to explain it.
What does the existence of this patent mean? Should I no longer be working on fuzzy hashing? Do I need to pay a license? Does the existence of spamsum and what it was based on (rsync) count as prior art? Does this patent cover what I think it covers? More? Less?