You can use fuzzy hashing to find source code reuse. For example, let's say you suspect I reused source code from one of my earlier projects, md5deep, when writing the fuzzy hashing program ssdeep. (Computer scientist types just love when you use a program to analyze itself. Somewhere there are 6.001 geeks dancing with metasyntactic glee as they read this.)
Let's say we have two folders, ssdeep-1.1 and md5deep-1.12. First we record the fuzzy hashes, with relative filenames (the -l switch) to a file:
C:\> ssdeep -lr md5deep-1.12 > hashes.txt
Then we compare those saved hashes with the other directory:
C:\> ssdeep -lrm hashes.txt ssdeep-1.1 ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94) ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35) ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)Aha! Those matches indicate source code reuse! A manual examination of the files in question is required to tell exactly what kind of copying occurred, but we've saved ourselves a lot of work!
For the really geeky, there's a way to do this via one command line, but it will also include all of the matches internal to each directory. Like this:
C:\> ssdeep -lrd md5deep-1.12 ssdeep-1.1 md5deep-1.12\md5.h matches md5deep-1.12\cycles.c (27) md5deep-1.12\sha1.h matches md5deep-1.12\cycles.c (25) md5deep-1.12\sha1.h matches md5deep-1.12\md5.h (58) md5deep-1.12\sha256.h matches md5deep-1.12\cycles.c (25) md5deep-1.12\sha256.h matches md5deep-1.12\md5.h (61) md5deep-1.12\sha256.h matches md5deep-1.12\sha1.h (57) md5deep-1.12\tiger.h matches md5deep-1.12\cycles.c (29) md5deep-1.12\tiger.h matches md5deep-1.12\md5.h (65) md5deep-1.12\tiger.h matches md5deep-1.12\sha1.h (63) md5deep-1.12\tiger.h matches md5deep-1.12\sha256.h (61) ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94) ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35) ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)
If you'd like to see the matches in both directions (i.e. for two files A and B that match, see that A matches B and B matches A), use the -p flag instead of -d.