Thursday, October 25, 2012

Incremental code review - MD5 hashes

This one is a reasonably obvious tip to be frank. I was recently doing a code review retest for a Java app. Now I don't know too much Java so I was finding it hard stepping through code.

One thing I knew though was that if you have the previous code base and the current one and are "re-testing" it's helpful to see which files have changed. Now anyone with a little Linux experience or programming experience will immediately say 'diff'. And yeah..diff is fine..but that will give you a very detailed listing in a slightly cryptic form. Its readable enough if you go slow, but it's not the first step IMO. I'd like a slightly more high level view first. That's how I stumpled upon md5deep.

So I go into the first root directory of the old code and run md5deep -rl > old_list there. Then I change directories to the new code and run md5deep -rl -X old_list > changes over here. Basically all it means is that I compared the new list against the old. Once I have my changes, I know that any fixes they made MUST be in some of those files.

Now I can run diff on each of those file combinations to search a little bit slowly. It's the same really..but it gives me a little more control over the process.

A tip here to further reduce changes is to use md5deep with grep to filter your results out. So I knew I was looking only for java file changes...so I could either do something like:

md5deep -rl -X old_list| egrep -i '.java$'

OR

md5deep -rl * -X ~/list_md5_old |egrep -v ".doc$|.DS_Store|.js|xml|changed_files|hibernate" > changed_files;wc -l changed_files

I used the 2nd one :) coz I knew what all I didn't want. If you don't know that..start with what you know and filter on from there.

There's a million ways of doing this of course; I just dropped in what I did :). Here's the blog post which saved me some time..

http://linhost.info/2010/02/checksum-a-directory-with-md5deep/

No comments: