I've talked quite a bit about what tools to use and when in general. While all that is correct in principle, I recently, after a lot of painful 'research' [mostly already out there somewhere] , came up with a process for myself to use the right tools at the right time. Here is a short summary of the same:
1) Do your dynamic analysis and document what you found.
2) To learn more you have to now do static analysis; don't start off with IDA Pro; it overwhelms you very quickly... specially if you are new.
3) Put the EXE through Olly or Immunity and start identifying what each function does, step by step. I'm just saying... don't be worried initially about understanding everything about the malware. If you can even confirm what you found in dynamic analysis, via static analysis and say that...I know what these 5 functions do...that's good enough for a start.
4) Now once you know what these 5 functions do, open the EXE up in IDA Pro and rename the 'known' functions from sub_4012345 to something meaningful, like sub_malware_connect_irc. Repeat for each function you know. Go back to Olly now.
5) Now take each function(known); say sub_malware_connect_irc and identify all of its system function calls.. connect() send() getcommandline() etc etc.
6) Look at MSDN and understand the arguments that are passed to each. See where these arguments are stored in the disassembly you have. Is it stored on the stack or in a variable?
7) If its in a variable go to IDA and give that variable a meaningful name. So for e.g rename something like dword_ptr_401324 to malware_irc_host-name. This will result in every single place where that variable is accessed, getting renamed to the new variable. So dword_ptr_401234 will no longer exist; it will be referred to as malware_irc_host-name. Repeat this process for all known functions and all known variables.
8) Once you have a few functions and all corresponding variables renamed, use the GroupNodes feature in IDA (Right click on any block, select GroupNodes) to collapse blocks you have already analysed. Give each block a name that you will recognize instantly, without having to look at the disassembly again. Repeat this for all blocks that you have analyzed. So this will reduce the disassembly that you have to look at, in other parts of the program you have not yet analyzed.
9) So now, to summarize, for all known functions you have renamed the functions, renamed the variables, and grouped blocks of code that you have already analyzed and named the block. This should give you a nice 'pseudo codish' flowchart in IDA for quite a few functions :)
10) Now use the 'Functions' submenu in IDA and sort by the first column to see how many functions are pending; you can get this by seeing how many start with sub_.
11) Go to each function and see where it is called from. You can do this first in Olly 1.10 by highlighting the first line (usually PUSH EBP) of the function and looking in the middle pane on where all it is called from. Visit each of the calls in the middle pane and see where they were called .. and so on. Do this till you get to the root of the call.. see if you can now understand 'when' it was triggered.. 'What' behavior triggered it? Do the renaming and grouping as before. If you can't understand.. at least give the function a name.. some name.. like dummy_notunderstood_1, dummy_notunderstood_2 and so on. That still is better than sub 401237.
12) At the end of all of this you should have a .IDB file fully named (as much as possible) and fully grouped. Now ..only now should you start drilling down into HOW exactly each function works...the exact algorithm behind each function and so on. Repeat this for as many interesting functions that you want.
13) Once this also is done, if you WANT , try rewriting this in a high level language, at least pseudo code so you can quickly refer back to it when you want.
I guess, this is all very intuitive for most reversers who have learnt this on their own or been reversing for a long time. It took me a long time though, to reach this level and hope it is helpful to any relative newbie reading this blog post and feeling lost. I know I was one a few months ago ;).
p.s.... Make sure you back the .IDB file up ;)
Here are 2 sample screen-shots:
1) Do your dynamic analysis and document what you found.
2) To learn more you have to now do static analysis; don't start off with IDA Pro; it overwhelms you very quickly... specially if you are new.
3) Put the EXE through Olly or Immunity and start identifying what each function does, step by step. I'm just saying... don't be worried initially about understanding everything about the malware. If you can even confirm what you found in dynamic analysis, via static analysis and say that...I know what these 5 functions do...that's good enough for a start.
4) Now once you know what these 5 functions do, open the EXE up in IDA Pro and rename the 'known' functions from sub_4012345 to something meaningful, like sub_malware_connect_irc. Repeat for each function you know. Go back to Olly now.
5) Now take each function(known); say sub_malware_connect_irc and identify all of its system function calls.. connect() send() getcommandline() etc etc.
6) Look at MSDN and understand the arguments that are passed to each. See where these arguments are stored in the disassembly you have. Is it stored on the stack or in a variable?
7) If its in a variable go to IDA and give that variable a meaningful name. So for e.g rename something like dword_ptr_401324 to malware_irc_host-name. This will result in every single place where that variable is accessed, getting renamed to the new variable. So dword_ptr_401234 will no longer exist; it will be referred to as malware_irc_host-name. Repeat this process for all known functions and all known variables.
8) Once you have a few functions and all corresponding variables renamed, use the GroupNodes feature in IDA (Right click on any block, select GroupNodes) to collapse blocks you have already analysed. Give each block a name that you will recognize instantly, without having to look at the disassembly again. Repeat this for all blocks that you have analyzed. So this will reduce the disassembly that you have to look at, in other parts of the program you have not yet analyzed.
9) So now, to summarize, for all known functions you have renamed the functions, renamed the variables, and grouped blocks of code that you have already analyzed and named the block. This should give you a nice 'pseudo codish' flowchart in IDA for quite a few functions :)
10) Now use the 'Functions' submenu in IDA and sort by the first column to see how many functions are pending; you can get this by seeing how many start with sub_.
11) Go to each function and see where it is called from. You can do this first in Olly 1.10 by highlighting the first line (usually PUSH EBP) of the function and looking in the middle pane on where all it is called from. Visit each of the calls in the middle pane and see where they were called .. and so on. Do this till you get to the root of the call.. see if you can now understand 'when' it was triggered.. 'What' behavior triggered it? Do the renaming and grouping as before. If you can't understand.. at least give the function a name.. some name.. like dummy_notunderstood_1, dummy_notunderstood_2 and so on. That still is better than sub 401237.
12) At the end of all of this you should have a .IDB file fully named (as much as possible) and fully grouped. Now ..only now should you start drilling down into HOW exactly each function works...the exact algorithm behind each function and so on. Repeat this for as many interesting functions that you want.
13) Once this also is done, if you WANT , try rewriting this in a high level language, at least pseudo code so you can quickly refer back to it when you want.
I guess, this is all very intuitive for most reversers who have learnt this on their own or been reversing for a long time. It took me a long time though, to reach this level and hope it is helpful to any relative newbie reading this blog post and feeling lost. I know I was one a few months ago ;).
p.s.... Make sure you back the .IDB file up ;)
Here are 2 sample screen-shots:
Sample Graph of Grouped Functions |
Renamed functions - A sample list |