Sunday, July 17, 2011

9 - Static Malware Analysis

We briefly looked at static analysis in Parts 3,4 & 5 and that's what you should read first if you want an introduction. However, that was with Linux as the base OS and the programs that we chose were just to give us a feel about how high level code (mainly control structures) would look in a debugger. In this series we're trying to study malware; which primarily affects Windows systems (for whatever reason ;)) and understand all its functionality - even those bits which aren't necessarily exposed when we initially run it. So for e.g there might be some malware which performs keylogging in the background; but wouldn't necessarily have a button called "Start Keylogger after 5 minutes".. so dynamic analysis might might miss it. But proper static analysis won't.. it'll take longer to identify stuff - of course but you will eventually find it.

Now we're almost always only going to have a single executable to analyze. We will almost certainly NOT have any source code available. If you did, the process of reverse engineering would be much simpler; you would only have to study the syntax for that particular language carefully and you would reach a conclusion about what that malware did, much much quicker. Since we have only the EXE and no code - we'll have to try and reconstruct code in some way and then draww our conclusions.

That's where a disassembler comes in. There's plenty of stuff available about disassemblers online; you can read up on them in detail if you're interested. However for the purpose of this article; its enough to mention that a DA just takes the EXE; opens it; and converts it into assembly language so its easier for you to read. After all its easier for you to understand an assembly language instruction rather than a lot of alphanumeric data joined together :). Have a look at an example: What would you say if you saw something like "EB 27" and told that it had some meaning? What would you then say if you were told there were 1,00,000 such lines and you needed to find out the meaning of each of those and find out what the malware did at the end of all of it? Not too much fun if you ask me. What a disassembler does is simply convert "EB 27" into an assembly instruction which will say "JMP 27 bits forward" from where you currently are. It'll do it for all 1,00,000 lines. So you need to look at assembly instructions instead of plain alphanumeric gibberish. Clearer? But yes, its still 1,00,000 lines. If you think its no big deal, just wait till you start reversing :)

So once an EXE is loaded into a disassembler and the assembly code is completely generated its time to start trying to understand it. "Understand" in malware analysis means - 'Its time to DEBUG it'. Remember gdb in Linux? For Windows we're going to use a tool called OllyDBG which will DA and DBG the EXE. Using a debugger you can control exactly how much the malware runs. You could for e.g say 'Run only the first 10 lines of assembly code. I'll study what changes were made and then you can continue.' We're doing this, to understand exactly how the program works. Makes sense? Excellent :)

So to sum up, if we want to improve at static reverse engineering, we need to understand assembly language (at least enough so we can read code), learn how DA's and DBG's work and understand how to use our tools in the best possible way. As we move on it will become necessary to understand more and more about Windows functions, and when and why they are called. MSDN will quickly become your friend :). This is what I will recommend you read; in the order that I've mentioned:

a) Malware ForensicsInvestigating and Analyzing Malicious Code
This is very basic and focuses primarily on Dynamic Analysis. So if you're clear about Dynamic Analysis and understand systems reasonably well at a high level you can skip this. If not, spend some time browsing through this.

b) Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code
This is an excellent book but is a little too advanced and wont help you immediately. Reading it after around 6 months or so in detail will help. However its a good idea to read Chapters 1,6 and 7 from this book. Again the focus is on dynamic analysis at this point; but spend a day or 2 on just these chapters.

c) Reversing: Secrets of Reverse Engineering
This book by Eilad Eldam is probably a must read for anyone starting off in Static reversing. Yes there are plenty of tutorials and plenty of videos and plenty of more advanced books available..but for me this was the book which helped me through those initial days when I was struggling to learn. It gives you just about enough.. just the right amount that you need to know about reversing before you move forward. Again, there are places you will get stuck even in this book but as long as you understand about 50% of it and learn how to use Olly better by the time you're through - you're good to go. Don't read beyond Chapter 6 when you're starting out.. that's all you need. Chapter 6 was for me; the most helpful chapter.. I could work my way through it very slowly and gain confidence.

d) The IDA PRO book
Eventually at some point of time in your reversing learnings, you will start using IDA PRO. There's a book completely dedicated to the same. It is 'rumoredly' the best disassembler in the market and if you get better at it you will want to use IDA at some stage. That said, its better to understand GDB, Olly and other slightly simpler tools..get your basics clear and then understand IDA. Don't START your career with IDA.

e) Windows Internals - 5th edition
The book contains everything about Windows. The better you are at understanding Windows internals the better you will get at analyzing malware. However, this is NOT a newbie book. I struggled for over 2 weeks reading its content, reading a course by F-Secure based on this book, reading various presentations and PDFs which had this as a reference...but it was just far far too complex for me. I was very frustrated at my lack of progress :(. That's not to say the book is bad, of course it is not but for someone who is starting out... I would not recommend it. Maybe a year or 2 later, when you're much clearer about the basics.. it will be a good reference.

Well... that is it for an introduction to Static reversing. I recommend you do a little bit of reading [from any of these books or even better sources] before proceeding forward. Its not compulsory but a few basics are always helpful :)

In the next article I will try and explain the key features of OllyDBG - just the ones that you need to know before you start out. There is documentation available of course, but a gentle introduction to "just what you need" will help you for sure.

Happy reading :)


Anonymous said...

what about practical malware analysis and reverse engineering for beginners?

Arvind said...

@Anon: Honestly, I haven't yet read that one but I'm sure it has some useful stuff in it. This is a bit of an old post though :)