| Fuzzy Hashing version 2.2 |
[Jul. 21st, 2009|09:58 pm] |
Good news everybody! I've published a new version of the ssdeep program for fuzzy hashing. The new version adds a long-requested feature: the capability to compare files of previously generated signatures. That is, let's say you compute some lists of fuzzy hashes like this:
C:\> ssdeep -r C: > list1.txt C:\> ssdeep -r D:\Malware Samples > list2.txt C:\> ssdeep -r E:\Temp\New Malware > list3.txt
You can now find any similar files in those two lists like this:
C:\> ssdeep -x list1.txt list2.txt list3.txt list1:C:\Windows\System32\ntoskrn1.exe matches list2:D:\Malware Samples\VIRUS.EXE (83)
Notice the filename of the known hashes is given in the output along with the matching filenames. |
|
|
| md5deep version 3.4 |
[Jun. 10th, 2009|05:40 pm] |
I've released version 3.4 of md5deep. This is a bug-fix release and addresses two serious problems. First, there was a memory leak while processing directories on Windows. Second, the -n mode, or Unused hashes mode has been fixed. My apologies for the errors. |
|
|
| md5deep version 3.3 |
[Apr. 4th, 2009|09:54 am] |
This morning I posted md5deep version 3.3. This is a bug-fix release intended to address two issues on Microsoft Windows. First, the program can now handle 64-bit timestamps, which previously could have caused a crashed. Second, the program now skips all reparse points (e.g. junction points, symbolic links, etc). There come up often on Windows Vista and Windows 7 and can cause a lot of extra work for the program. The resulting code is not perfect, someday the user should be able to control the recursion process, but it's better this way than before. Enjoy! |
|
|
| ssdeep version 2.1 is out |
[Jan. 1st, 2009|09:57 am] |
Happy New Year! Now you can prove 2008 is a lot like 2009 using the latest version of ssdeep, your favorite fuzzy hashing program and API. This is mostly a bug fix release, but you can now use the API to hash a file without having to open it yourself. Enjoy! |
|
|
| Audit paper published |
[Dec. 12th, 2008|08:03 am] |
This morning the latest issue of Digital Forensic Practice was published. Among the article in this issue is my piece Auditing Hash Sets: Lessons Learned from Jurassic Park. The abstract is below, but this paper is important because it highlights where traditional hash matching techniques fall down during incident response or tool testing. We have good tools to find matches to known files and good tools to find files that aren't known matches. Hashdeep is the first tool I know of that does both and provides a complete picture of known files in compared to the current filesystem. It's intended for forensics geeks and system administrators, but all are welcome to try it out.
Auditing Hash Sets: Lessons Learned from Jurassic Park Auditing a set of cryptographic hashes allows a forensic examiner to determine the state of a target directory as compared to those hashes. Unlike traditional hash comparison methods, an audit takes into account all of the files in the target directory and their relative paths. Not taking these data into account can impair examinations and tool certifications. An audit examines each file in the target directory, computes its hash, and compares it to a file containing the known hash values. Any file not in the set of known hashes is flagged as being inserted. When all of the files in the target directory have been examined, any known hashes that have not been matched are flagged as being missing. The result is a complete picture comparing the set of known hashes and the target directory. |
|
|
| Fuzzy Hashing on Virus Total |
[Nov. 24th, 2008|01:39 pm] |
Fuzzy hashes have been incorporated into the Virus Total automated analysis. In their words, "VirusTotal is a service that analyzes suspicious files and facilitates the quick detection of viruses, worms, trojans, and all kinds of malware detected by antivirus engines." An executable submitted to the web site is scanned with several anti-virus engines and matched against a set of suspicious files. As of today they are computing the fuzzy hashes of incoming files! I don't know if they are comparing them against a set of known files, but this is a great step forward. |
|
|
| Introducing hashdeep and faster md5deep |
[Jul. 29th, 2008|06:27 am] |
I am pleased to announce the release of md5deep version 3.1 along with a new program, hashdeep. Along with some cosmetic bug fixes, this version of md5deep should be about 10-15% faster than version 3.0 thanks to the removal of some redundant code. The new hashdeep has two primary features, multihashing and hash set auditing. Multihashing is the ability to compute more than one hash algorithm simultaneously. Technically this feature isn't really "new", per se, it's been a part of programs like FSUM and Dan Mares' hash for years. The real magic is in the hash set auditing.
Auditing Hash SetsThe benefits of hash set auditing will be fully described in the paper Audiing Hash Sets, hopefully to be published soon. Here's the abstract:
Auditing a set of cryptographic hashes allows a forensic examiner to determine the state of a target directory as compared to those hashes. Unlike traditional hash comparison methods, an audit takes into account all of the files in the target directory and their relative paths. Not taking these data into account can impair examinations and tool certifications. An audit examines each file in the target directory, computes its hash, and compares it to a file containing the known hash values. Any file not in the set of known hashes is flagged as being inserted. When all of the files in the target directory have been examined, any known hashes that have not been matched are flagged as being missing. The result is a complete picture comparing the set of known hashes and the target directory. I'll post more details on the paper as they become available. In the meantime, here's the complete list of changes in this version of md5deep:
New Features - Added hashdeep program to support multihashing and hash set auding
- Streamlined file size computation process, which makes the programs about 15% faster.
- Added size threshold modes to only process files smaller than a given size.
- Added a timestamp mode that records the creation time time for each file on Win32, the change time on all other operating systems.
- Added support for new iLook style hashes
Bug Fixes - Corrected time estimates for large files (e.g. files which require more than one day).
- Fixed obscure bug that caused a crash (double free) when attempting to check a very small file for EnCase hashes
|
|
|
| md5deep and Cygwin Ports |
[Jul. 19th, 2008|01:07 pm] |
Thanks to a blog post by Mark Stam about using md5deep, I've discovered that md5deep has been added to the Cygwin Ports project. The project "provides Cygwin binary and source packages for a large variety of programs and libraries, including the GNOME and KDE desktop environments." This means that Cygwin users can download a binary package of md5deep and its associated tools.
Because I'm not a Cygwin user it's hard for me to test out the automatic installation method, but it appears that you should be able use Cygwin's Setup program to get those ports by adding ftp://sunsite.dk/projects/cygwinports to the server list.
And yes, the screenshot in Mark's post does look a little odd. I'm looking into it. |
|
|
| The Fuzzy Hashing Patent |
[May. 15th, 2008|06:40 am] |
It appears that somebody has patented fuzzy hashing. Specifically, US Patent 7,272,602, System and method for unorchestrated determination of data sequences using "sticky byte” factoring to determine breakpoints in digital sequences, was issued to Gregory Hagan Moulton of the EMC Corporation on 18 Sep 2007.
When I published my fuzzy hashing paper, I expected (and hoped) other researchers would improve the algorithm. A search this morning revealed two papers that appear to do so, An Efficient Piecewise Hashing Method for Computer Forensics and Improving Disk Sector Integrity Using 3-dimension Hashing Scheme. Excellent! I can't wait to read these.
But the same search also revealed the patent. Submitted in 2004, the patent examiner apparently cited my fuzzy hashing paper published in 2006. Please don't ever let anybody tell you I "invented" fuzzy hashing. I had the idea of using the existing (bad) spam detector, spamsum, for computer forensics. I combined the existing spamsum engine with the md5deep interface to create ssdeep and wrote the paper to explain it.
What does the existence of this patent mean? Should I no longer be working on fuzzy hashing? Do I need to pay a license? Does the existence of spamsum and what it was based on (rsync) count as prior art? Does this patent cover what I think it covers? More? Less? |
|
|
| ssdeep version 2.0 published |
[Apr. 4th, 2008|06:45 am] |
After several false starts, I've published version two of my fuzzy hashing program ssdeep. The new version separates the hashing code into an API which developers are free to use in their own code. The library is GPL'ed (hey, that's how I got it), so only Free software developers need apply. I've also added a user-contributed threshold mode to only report results above a certain level of similarity, a CSV output mode, and support for filenames with Unicode characters. Enjoy! |
|
|
| md5deep version 3.0 alpha1 |
[Mar. 12th, 2008|11:42 pm] |
I have published an alpha version of md5deep 3.0. Although not much has changed for our friends md5deep, sha1deep, etc, I have created a new program, hashdeep. This program supports multihashing, or computing more than one hash algorithm at a time. I'll post more details later, including describing the new audit mode, but in the meantime please remember this is alpha quality code.
By default the program computes both MD5 and SHA-256 hashes:
$ hashdeep foo bar %%%% HASHDEEP-1.0 %%%% size,md5,sha256,filename ## Invoked from: /Users/jessekornblum ## $ hashdeep foo bar 29,69a3a1f6e6f671a1a158ee09c7016ec7,f9650a0cf19e246a158318399d35e3d1a27697ceea2ac4abdc6a4ca2b6b6b75c,/Users/jessekornblum/foo 29,30290eea368926965343ce8ff30a458e,26d7b73c6ffd2fa09c0e30d947c776f229d1d6315dd4ab7e012f484c1bad2ed0,/Users/jessekornblum/bar
You can specify more (or fewer) hashes to compute with the -c flag.
$ hashdeep -c md5,tiger,whirlpool,sha256 foo bar %%%% HASHDEEP-1.0 %%%% size,md5,sha256,tiger,whirlpool,filename ## Invoked from: /Users/jessekornblum ## $ hashdeep -c md5 foo bar 29,69a3a1f6e6f671a1a158ee09c7016ec7,f9650a0cf19e246a158318399d35e3d1a27697ceea2ac4abdc6a4ca2b6b6b75c,7c29873518894c1c6bd793f2f22d2f766fd4cebe4580782e,70f541f3b09a8fbea0f0b5cb4dc4ce86ca2dfe1f50f6e6e6328bb00451ecaad62afbc44ac2d3872c3610f2f540a2027f6f930cbad32b38480d4a05bb70da8ec2,/Users/jessekornblum/foo 29,30290eea368926965343ce8ff30a458e,26d7b73c6ffd2fa09c0e30d947c776f229d1d6315dd4ab7e012f484c1bad2ed0,035d6097b7d7ec26cca39a843949ed7c0d789b4d1a5c0def,556d60677e47cf3b1befbdf4595cef6c1e7aaea8a32255d2039d5d5c6b710503d7576fdf76c81482ea6b29e13f75ca33661fb12bb1b8f75ea2d931afa77e3054,/Users/jessekornblum/bar
Note that the output records the command line arguments and indicates which kinds of hashes a file contains. |
|
|
| Beta version of ssdeep |
[Feb. 22nd, 2008|03:01 pm] |
After a long wait I have updated my fuzzy hashing program, ssdeep. The new version is in beta now and is available for download. The changes in this version:
- Created a fuzzy hashing API/DLL - After many requests I have separated the hashing code into a separate library, fuzzy.dll. I've also set up an API, described in fuzzy.h that should allow you to add fuzzy hashing to your own programs. Please note that this code is licensed under the General Public License. As such, any programs you write using the library will most likely be required to be GPL'ed as well.
- Added support for filenames with Unicode characters on Win32 - Previously such files would be ignored with an error message, "No such file or directory."
- Added threshold mode - Allows the user to only display matches above a certain match score.
- Added CSV mode - The output is displayed in comma separated value format.
- Fixed extra characters appearing during verbose mode - This was a minor bug, but it should be gone now.
Enjoy! |
|
|
| Lots of forensics bits |
[Jan. 19th, 2008|09:58 am] |
Here are the latest and greatest things I've seen in the wild and wooly world of computer forensics:
The people who organized and ran the DoD Cyber Crime Conference are amazing. There were so many moving parts that make this it made my head hurt just thinking about it. To all of the staff: Thank you! Now go get some sleep.
For next year's conference (and probably all other conferences): Publishing a "geek meter" for each talk to describe its technical level is a great idea. Granted, because each speaker rates her own material it's a subjective measurement, but it's still handy. It would be better, however, to publish the geek meter ratings in the schedule grid so that attendees can see at glance what might be the best talk for them at any given time.
The development of Hashdeep, part of the md5deep project, has stalled. I now expect a beta version to be released on 15 Mar 2008. Sorry, but this can happen with Free software.
During the CEIC Conference in April 2008 there is a three hour block on "Detecting Malicious Code: The Next Generation of Physical Memory Analysis" by two people from Guidance. It's possible that Guidance/EnCase is getting into the memory analysis game!
The 2008 Digital Forensic Workshop Challenge is out. This year the organizers want us to gather, analyze, and correlate data from a Linux memory image, a file system image, and a network capture. Submissions are due 10 July 2008. The 2008 DFRWS Conference will be held from 11-13 August in Baltimore, MD. |
|
|
| md5deep and Windows 9x support |
[Dec. 10th, 2007|12:35 pm] |
An md5deep user wrote to me last week and reported that md5deep version 2.0 does not run on Windows 98. After my initial surprise that anybody was still using Windows 9x, I'm now left with a dilemma. I'd bet a anything that the code I added in version 2.0 to support Unicode filenames, a feature requested since 2003, broke the support for Windows 9x. Although it should be possible to work around the problem, doing so would make the resulting program more complicated and more difficult to maintain.
How many people out there in userland are still using Windows 9x? (You can post anonymously if you'd like...) Do you think it's worth maintaining support for these operating systems? Yes, Microsoft stopped officially supporting them years ago, but that doesn't mean that ordinary users gave them up. Thoughts? |
|
|
| Hash Algorithm Contest |
[Nov. 13th, 2007|07:00 pm] |
The National Institute of Standards and Technology (NIST) is holding a cryptographic hash algorithm contest. Details from their web site:NIST has opened a public competition to develop a new cryptographic hash algorithm, which converts a variable length message into a short “message digest” that can be used for digital signatures, message authentication and other applications. The competition is NIST’s response to recent advances in the cryptanalysis of hash functions. The new hash algorithm will be called “SHA-3” and will augment the hash algorithms currently specified in FIPS 180-2, Secure Hash Standard. Entries for the competition must be received by October 31, 2008. The competition is announced in the Federal Register Notice published on November 2, 2007 There are many more details available on the competition homepage. I'm hopeful that md5deep will support the winning SHA-3 algorithm. |
|
|
| md5deep 2.0 released |
[Oct. 15th, 2007|06:51 pm] |
Tonight I have published the official release of md5deep version 2.0. As always, you can download either a Windows version or the source code. Here are the changes in the new version: New Features - Using GNU Autotools for configuration and compilation. This should help avoid platform specific issues such as SHA-1 problems previously found on 64-bit versions of AIX.
- Added support for files with Unicode characters in their filenames on Microsoft Windows.
- Added support for EnCase hash sets (.hash files).
- Updated web site and quick start guide
- Slightly reduced the size of all of the executables by removing duplicated code.
Bug Fixes - Fixed time estimation mode for block devices on OS X and Linux
- Fixed cosmetic error where estimated time remaining mode is being used in conjunction with piecewise hashing. Time estimates are now based on the whole file, not just each piece.
- Clarified licensing issues in COPYING for tiger.c
- Changed some data types in hashing functions to C99 standard. Whirlpool seems to be working well enough without changes.
- Wrapped all of the global variables into the state structure
|
|
|
| md5deep version 2.0 release candidate 1 |
[Sep. 26th, 2007|10:04 pm] |
On Thursday I'm publishing the first release candidate* of md5deep version 2.0. You can download either a Windows binary or the source code. The list of new features and bug fixes is below. Note that there have been some minor improvements since the beta: - Moved to GNU autotools for configuration and compilation
- Added Windows support for files with Unicode characters in their name
- Added support for EnCase hash sets
- Fixed time estimation mode for block devices on both Linux and OS X
- Fixed cosmetic error where estimated time remaining mode is being used in conjunction with piecewise hashing. Time estimates are now based on the whole file, not just each piece.
- Removed references to BYTE_ORDER in SHA-1 code in favor of WORDS_BIGENDIAN. This should fix the AIX problem.
- Slightly reduced the size of all of the executables by removing duplicated code. This is part of a long-term plan to have a single application that does all of the hashing algorithms.
- Clarified licensing issues in COPYING for tiger.c
- Changed some data types in hashing functions to C99 standard. Whirlpool seems to be working well enough without changes.
- Wrapped all of the global variables into the state structure
* If no problems are found in this version, it will become the official version 2.0 release on 15 October. |
|
|
| navigation |
| [ |
viewing |
| |
most recent entries |
] |
| [ |
go |
| |
earlier |
] |
| |
|
|