Showing posts with label reverse. Show all posts
Showing posts with label reverse. Show all posts

Friday, January 30, 2015

A malware sample I analyzed

Recently I analyzed a malware sample. I don't know what it was or whether I completed it but I stepped through it and wrote a very detailed report about it that I'd like to share now.

It is completely possible that I have missed things in it, but honestly anyone reading through it, specially if you're at the beginner-intermediate level should get some useful information from it.

I'd love to hear more feedback on how things can be done better, and if anyone has indeed analyzed this deeper and better than me - do call me out.. and if you can get in touch with me somehow so I can learn :)

I started a new repository on Git just now - to add a lot of my random stuff that doesn't really have a specific home. Here's the link to the PDF report (no it is not malicious :)).

https://github.com/arvinddoraiswamy/blahblah/blob/master/somevirus.pdf

I cannot see how I can upload the sample to offensivecomputing so here is a link to a virus total analysis instead. I guess anyone interested should be able to find a sample using the hashes on this link.

https://www.virustotal.com/en/file/5564bed78d23ad0ad198a0dbbf4196f5fdcc1eb8529673941736db18c3257e0b/analysis/

Thursday, December 26, 2013

Patching and Code Caves - Reverse Engineering

The previous post where I solved a reversing challenge was a good example of a place where I could demonstrate a little bit of patching and also use something called a code cave [Thanks Dns]

Patching a program effectively means, change something in the program so that it behaves a little differently - usually this is a change in control flow. In the CSAW example, there's 3 places we can patch the code. We can change the flow of the code so it always chooses 1 in the first switch-case (40109A), 3 in the 2nd switch-case (401120) and we can finally NOP the jump out at 401171 so it calls 401000, no matter what.

Here are 3 screen-shots showing the 3 patches.





If you remember, towards the end of the previous blog, we had to look for the flag inside Olly and the pop-up wouldn't get populated with the flag.The solution that I suggested there was to manually increment the address inside ESI so that it populated the pop-up. There's another cooler solution called a code cave, that will enable us to automatically increment ESI and cause the message box to be displayed.

Here is a screen-shot of the patch that I use, to force the code to jump to a different location, increment ESI, jump back and then call the message box.


This causes the flag to be displayed inside the Message Box itself.


Thursday, November 21, 2013

Are you sure you're clean?

I do a ton of penetration testing as a tester. I've been doing this for quite a while now - nearly a decade. I advise a ton of peers, juniors, clients, non tech users .. well pretty much anyone about how to stay safe. I know all of this. And yet.. recently I inexcusably slipped up.

One of my Virtual machines which I'd cloned to test a thick client at a client site had a few viruses on it. One of them was C:\Windows\update.exe which did not sound good in the least. Most I found in Temporary Internet Files of the Local service and Network service accounts and there was 1 more EXE file somewhere.

The malware didn't behave like how it was "supposed" to based on the description on the website. So maybe they were all benign...and I was good. But that still doesn't explain how they got on to  my disk in the first place...and what they were doing a VM which also handled customer data. The worst part was that it had probably been lying there for a while without me noticing it. Most probably some remnant of my own research..but I can't be 100% sure. In short.. it was downright dumb on my part. No excuses.

So that made me relook at my setup and I have since gone on a drive to clean it all up. I dumped all my VMs (still in progress) and created everything from scratch from the ISOs, updated them, will harden them a bit and take snapshots of the clean state.

I deleted all my malware analysis images, reversing images and will recreate them from scratch and snapshot those too. So in the end here's a list of the VMs that I will eventually have.

1) WinXP 32 bit
2) Win 7 64 bit
3) Ubuntu 32 bit
4) Ubuntu 64 bit
5) Client WinXP machine + Word + Visio
6) Linux dev environment
7) Windows 64 bit reversing
8) Linux 64 bit reversing

All of them updated, hardened (services turned off) and snapshots taken. Ideally I'd just move everything on to a separate machine..all my malware... but I don't yet have another machine. Once I get that, I'll move all my reversing to a separate machine.

Overall though - you're not immune from doing stupid things. You may know but it doesn't mean you're perfect. Hopefully this post will help all you guys who dabble in multiple technologies all the time - to serve as a reminder of what can and does ..go wrong at times.

Thursday, August 15, 2013

Reverse Engineering DLLs

A DLL is usually imported by an EXE. A DLL usually has a number of functions that the EXE can usually directly use. If you want to debug an EXE in OllyDbg, all you need to do is load it in Olly and set a breakpoint on the entry point of the Exe. If you want to debug a specific DLL though, it's not that straightforward.

There's 2 ways of doing this:

a) Open the DLL in Olly. If you also have an EXE called LoadDLL.exe in the Olly directory on your hard disk, LoadDLL.exe will automatically pick up the DLL you want to analyze, load it and stop at the entry point for the DLL. This though seems to work only for Olly 1.10.

b) The other way of doing this is to tell Olly to break each time a new DLL is loaded. You can do this by going to Olly's (2.01) Options - Debugging - Events and tick the box which says 'Pause on New DLL' and OK your way out.

The next time you load an Exe which in turn loads up DLLs at runtime, Olly will break each time a new DLL is loaded. So you can keep hitting F9 (Run program) until you reach the DLL you want to debug.

Now you can debug the DLL as you would debug an EXE :)

Friday, May 11, 2012

Reverse Engineering - Android APK

There's nothing really complex about this post. In the past I've always maintained that reversing is kind of tough; and that's true if its an EXE,DLL etc. But in the case of an Android application; it really is very easy.

While all this information is already there, here is a very short blog post summarizing how you get from APK to Source. I used a Vuln app available here; but you can use anything really.

If all you want is the source, do the following:

1. Use dex2jar downloaded from here and run ./dex2jar.sh . This results in a JAR file getting created in the directory where the APK is already present. The JAR file contains all the JAVA class files; namely the Java byte code - something that you get once you compile your Java code.

2. Convert your Java byte code into actual Java code. You need something called a Java decompiler for this. You can download one called the Jd-Gui from here - http://jd.benow.ca/#jd-gui-download.Load the JAR into it and use the 'Save all sources' feature to save all the source (Java) files to your disk. Now you can review it like you would review any Java code.

Some other interesting things about APKs though:

1. An APK can be extracted to a folder. Its just a Zip file so any archiving program should do; I use the inbuilt Ubuntu GUI archiving tool. You can also use unzip, 7z, WinRar or anything you want.

2. Look at the file AndroidManifest.xml. It'll open in a Text editor but its largely binary; hence unreadable. Use Apktool to decode this XML file as well as every other XML file in the APK. You can get it from here. Run apktool against the APK. It'll run and give you all the XML files totally decoded; so you can now read them.

3. All the application code is in classes.dex. This is Android Byte code, reversible to Java Byte Code. So we first 'dedex'. Use 'dedexer' that can be found here. Run it as follows: java -jar ddx1.22.jar -d classes.dex.

Nothing in this post is original. I just wanted a place to refer to instead of Googling a million times for the syntax.

Sunday, September 25, 2011

Reverse Engineering - Know your tools...

I've talked quite a bit about what tools to use and when in general. While all that is correct in principle, I recently, after a lot of painful 'research' [mostly already out there somewhere] , came up with a process for myself to use the right tools at the right time. Here is a short summary of the same:

1) Do your dynamic analysis and document what you found.

2) To learn more you have to now do static analysis; don't start off with IDA Pro; it overwhelms you very quickly... specially if you are new.

3) Put the EXE through Olly or Immunity and start identifying what each function does, step by step. I'm just saying... don't be worried initially about understanding everything about the malware. If you can even confirm what you found in dynamic analysis, via static analysis and say that...I know what these 5 functions do...that's good enough for a start.

4) Now once you know what these 5 functions do, open the EXE up in IDA Pro and rename the 'known' functions from sub_4012345 to something meaningful, like sub_malware_connect_irc. Repeat for each function you know. Go back to Olly now.

5) Now take each function(known); say sub_malware_connect_irc and identify all of its system function calls.. connect() send() getcommandline() etc etc.

6) Look at MSDN and understand the arguments that are passed to each. See where these arguments are stored in the disassembly you have. Is it stored on the stack or in a variable?

7) If its in a variable go to IDA and give that variable a meaningful name. So for e.g rename something like dword_ptr_401324 to malware_irc_host-name. This will result in every single place where that variable is accessed, getting renamed to the new variable. So dword_ptr_401234 will no longer exist; it will be referred to as malware_irc_host-name. Repeat this process for all known functions and all known variables.

8) Once you have a few functions and all corresponding variables renamed, use the GroupNodes feature in IDA (Right click on any block, select GroupNodes) to collapse blocks you have already analysed. Give each block a name that you will recognize instantly, without having to look at the disassembly again. Repeat this for all blocks that you have analyzed. So this will reduce the disassembly that you have to look at, in other parts of the program you have not yet analyzed.

9) So now, to summarize, for all known functions you have renamed the functions, renamed the variables, and grouped blocks of code that you have already analyzed and named the block. This should give you a nice 'pseudo codish' flowchart in IDA for quite a few functions :)

10)  Now use the 'Functions' submenu in IDA and sort by the first column to see how many functions are pending; you can get this by seeing how many start with sub_.

11) Go to each function and see where it is called from. You can do this first in Olly 1.10 by highlighting the first line (usually PUSH EBP) of the function and looking in the middle pane on where all it is called from. Visit each of the calls in the middle pane and see where they were called .. and so on. Do this till you get to the root of the call.. see if you can now understand 'when' it was triggered.. 'What' behavior triggered it? Do the renaming and grouping as before. If you can't understand.. at least give the function a name.. some name.. like dummy_notunderstood_1, dummy_notunderstood_2 and so on. That still is better than sub 401237.

12) At the end of all of this you should have a .IDB file fully named (as much as possible) and fully grouped. Now ..only now should you start drilling down into HOW exactly each function works...the exact algorithm behind each function and so on. Repeat this for as many interesting functions that you want.

13) Once this also is done, if you WANT , try rewriting this in a high level language, at least pseudo code so you can quickly refer back to it when you want.

I guess, this is all very intuitive for most reversers who have learnt this on their own or been reversing for a long time. It took me a long time though, to reach this level and hope it is helpful to any relative newbie reading this blog post and feeling lost. I know I was one a few months ago ;).

p.s.... Make sure you back the .IDB file up ;)

Here are 2 sample screen-shots:
Sample Graph of Grouped Functions  


Renamed functions - A sample list

Thursday, September 22, 2011

Debugging threads - Olly

Recently I was debugging a piece of malware which launched numerous threads inside, after it ran. Now, after the thread spawned, I could no longer F7 or F8 my way through the malware and understand things. This was because it was the thread which was doing all the work. So somehow I needed to get into the thread.

The first thing I did was 'Right click' and then select a thread from the Threads sub menu. That though just seemed to take me to system space, which was kind of useless. I wanted to see what the Thread did in User Space.

I looked at the CreateThread API then, which was what was being used. The 3rd argument to the function was a start address for the thread. I did a Ctrl+G, went to that address in Olly and put a breakpoint there, and then restarted the program. Went on as normal till CreateThread and then F9'd to run till next breakpoint. The main thread still "hung" but I did break inside UserMode for the Thread and could debug it.. yay :)

If you want to break even before UserMode and want to track it the moment the thread is launched, you can set debugging options in Olly to break each time a new thread is started or stopped. There's simple check boxes under the Options menu. Go search :)

The last bit is when  the Thread itself exits..it just says Thread Terminated and you again cannot F7 or F8 because there is nothing left to F7 or F8 into. You need to get back to the main thread, where the CreateThread API was called. Makes sense ..rt? Main.. created a thread...I debugged thread...now I come back to main...once I finish debugging the thread.

To do so, pause the program(F12) after the thread terminates and hit Alt+F9 to return to user mode. This will bring you right to the spot after CreateThread was first called.


Hope this helps someone newish to reversing :). Have fun!!

Saturday, July 30, 2011

12.3 - Example - Static Malware Analysis(Continued)

In the last part we saw that a part of the code launches an embedded instance of IE. We also have Wireshark running so we can capture any network traffic that is sent by the malware. We also broke last time at 40B511, exiting from the thread that created IE. Lets move forward quicker now and see what else happens.

Remember that the more functions you can identify; the easier it is for you to analyze the overall purpose of the executable. There are a few CALLs that happen which just try and find out which directory a file was from or do other relatively unimportant operations from a malware perspective. We then come to a call - CALL 0040B849. On going deeper into this function you will find that its a very interesting function which logs all the operations of the malware into a file in C:\. So that now is another bit of information that we can tally with our dynamic analysis. And since its logging stuff... it also gives us some nice inputs about what the malware is doing at a specific amount of time. Open up the file soft_AOL.log in C: and see for yourself. Another key point that you'd want to know is that this function is also a very common function - look at the middle pane again in Olly just after you step into it. It has a lot of addresses from where a call was made. For e.g:
Local calls from 00401729, 00401754, 004017D8, 004019B6, 004019D8, 00401B85, 00401C5B, 00401D13, 00401D6C, 00401F3E, 00401F99, 004026F6, 0040278B, 004027C6, 00402894, 0040293D, 00402ACE, 00402B1F, 00402B69, 00402B71, 00402B7B, 00402BA5, 00402C4A, ...

At 40B56B the existence of a file c:\Windows\lgv is checked. Its not too clear why it does this at this point; maybe its confirmation that the malware is active and more stuff can be done. Keep looking at Wireshark by the way - the moment you see some data there, you know you have to stop and go back and see where the traffic came from. So lets F8 on till we come to a call at 0040b679 - CALL 004029B8. Step over this as well and then quickly look at Wireshark..its suddenly filled up with traffic. That means the previous CALL was very important. Put a breakpoint on that call and restart; when you get to the CALL step into it. Starting at 4029B8 F8 on till 402A08 where Wireshark starts filling up again. Put another breakpoint, restart and step into the CALL at 402A0B this time. And so on..you keep stepping in till you find out exactly where the call is and why Wireshark is filling up. Once you step into the CALL at 402A0B you reach the address at 0040B01A. Now look at the code below..lots of comments there and some very interesting ones. I won't talk about every single CALL, I'll never finish that way ;). Before you move on though, do step into the call at the address 40B04B - its the routine which decides the list of domains that the malware will talk to at some point. There's a small encryption algorithm in there.. do try and see how the names of those domains are generated. Notice all that text btw? "scan domain attempt" or "check inet"... all of that goes into that log file in C: . Have a look.

Go on till you reach 40B0AC where there is a JMP 40B113. This means that control flow switches to this address. F8 so the JMP happens and at the very next instruction jumps back to 40B0AE. F8 on till you reach 40B0CB and look at the EAX register on the right side. You see a weird looking site there? Yes you do :). Now that is a big piece of information..maybe maybe the malware connects there for some purpose? Go on til 40B0E2 and look at EAX again. You see something called .sys.php? Looks like a part of some URL. Things are getting interesting now..aren't they? Go on till 40B0F8. Look at Wireshark and notice that nothing has happened so far. F8 over the CALL 405086 statement and now look at Wireshark. Yesss!! Traffic seen..and its a HTTP request made to that site we saw in EAX and for the file .sys.php. Look at the response to the request though in Wireshark - its a 404 not found. The malware did not find what it was looking for. How does it respond? F8 on and you'll notice that you get to the same point where you jump back to 40B0AE and repeat this same stuff with a new site. This process continues until some site responds with a 200 OK message, saying "Yes I have that file". Put a breakpoint on 40B17B so the code just doesn't connect and complete the remaining process before you can intervene.

The code returns eventually to 402A10 and continues from there. Lots of interesting stuff but just continue on till you reach 402BA0. "Crypted Code detected".. hmmm. Something encrypted was returned when we made a request for .sys.php and the malware is decrypting it for some reason. How does this decryption work? Do not know. F8 on till you reach 402BE5. Look at the contents of the EBX register now? See a message there with a URL that looks like Internet Spam?? Right click on the EBX register and say "Follow in Dump". The whole URL is very clearly visible now!!

I think the rest is guessable. This malware connects to a huge list of random sites, finds an active on, requests a URL from it, gets an encrypted message, decrypts it and then posts it on lifestream.aol.com which is AOL's social media platform using some user ID and password. If you continue stepping through the code you will also find out the credentials that are used each time. I tried it 3 times and found 3 different passwords - I think it creates a new account, posts Spam and logs out. But hey..don't take my word for it. Try it out for yourself :)

At one point of time the Embedded IE window can be viewed completely and you can actually see malware entering credentials into it and coming back with a "Cannot Login" message.

I still have questions about some parts of this malware and could not fully understand every bit. For e.g I could not find out how the SSL connection which we talked about during dynamic analysis was established. But I'm sure its all a matter of time and thought that all of this itself is a good learning and many newbies to this field can gain something from it.

I hope you enjoyed this and you feel reversing is doable by you as well. Until next time goodbye :)

12.2 - Example - Static Malware Analysis(Continued)

Make sure you read the previous post before you start on this one. We started doing some analysis on a piece of malware and looked at the basics of how to start static analysis. We'll pick off where we left off last time. Now if you look at the assembly code and scroll up and down you will see a huge number of CALL statements and many JMP statements and many other loops. Yes you could, as we discussed last time Step Into each and every call ..mark it and go on till you reach the end of the program. That's just fine. So for example - I did this for the next few CALL statements.. just to get a clearer picture and see if this strategy works.

00411D9E . E8 DD0A0000 CALL aolsbm_1.00412880 ; Nothing here - intermediate function
00411DA7 . FF15 28014200 CALL DWORD PTR DS:[<&KERNEL32.GetStartupInfoW>] ; \-------------- Windows appearance at startup
00411DBC . FF15 2C014200 CALL DWORD PTR DS:[<&KERNEL32.HeapSetInformation>] ; --------------------- Heap memory
00411E0B > \E8 44050000 CALL aolsbm_1.00412354 ; ------------- Heap memory create
00411E1C > \E8 90410000 CALL aolsbm_1.00415FB1 ; --------------- Process and thread info gather
00411E27 . E8 42FFFFFF CALL aolsbm_1.00411D6E ; Some common function (Later)
00411E2D > \E8 824D0000 CALL aolsbm_1.00416BB4 ; ---------------- Some ntdll function
00411E35 . E8 9F0A0000 CALL aolsbm_1.004128D9 ; -------------------- Get file handles
00411E45 . 59 POP ECX ; All process setting up till here - ignore

00411E46 > \FF15 30014200 CALL DWORD PTR DS:[<&KERNEL32.GetCommandLineA>] ; Get Program's Command Line arguments
00411E51 . E8 BF750000 CALL aolsbm_1.00419415 ; Environment strings
00411E5B . E8 FA740000 CALL aolsbm_1.0041935A ; ------------------------ Get module file name
00411E6C > \E8 73720000 CALL aolsbm_1.004190E4 ; -------------- Getting user directories, startupinfo, env variables etc
00411E90 > \E8 F0710000 CALL aolsbm_1.00419085 ; --------------------- Some path parsing of the executable path again


Yeah that'll do...I've stepped into each and every one of these call functions(user defined) and studied them briefly and decided whether to delve deeper into them or not. The question each time is - Does this bring me closer to understanding what the malware does? If the answer is NO, just comment it and move on. All ok? Yeah ok...except its going to take a huge amount of time to get to the end of the program. So it is probably a good idea to step back a bit and try and see if your dynamic analysis can help you move forward a little quicker.

Now you know that the malware definitely connects out to the Internet and does some stuff there. So why not startup Wireshark and see what goes out..as you Step Over and Step Into various bits of code? Right? Lets start Wireshark up then. What else? Well.. we know that if a connection to the Internet is made it must use some 'Windows Socket functions' to do so...like 'connect', 'send', 'gethostbyname' and so on. So we want to stop the program whenever these functions are used..which translates to 'We want to break..or we want to set a breakpoint'. So we want to now find out where these functions are being used in the program and break there. So we just hit Ctrl+N and search for the 'connect' function. Now if you look at the Ctrl+N window in Olly 1.10 and you're a newbie like me, you'll get confused because you won't see any 'connect' functions there and you'll spend time moaning about everything ;). But if you used Olly 2.0x and search you'll see a line which has WS2_32.connect in the comments section. The function name is WS2_32.#4. So come back to Olly 1.10(We'll primarily use this as it has more features), right click on the line for WS2_32.#4 and click 'Find references to import'. Promptly a box with an address 00403FD3 comes up..it means there is something at this address which calls the connect function. Right click on this line and set a breakpoint (Toggle breakpoint). So now..whenever the code reaches the line with the address 00403FD3 it will stop and you can analyze the function that called it and work your way backward from there. Easy? No.. not really if it is your first time... but logical..yes. It'll get easier the more you do...lets go on.

So that's another rule learnt then, if you are sure that the malware MUST use certain Windows functions for a specific purpose, which you know because you've done dynamic analysis - read up on MSDN about all those functions and set breakpoints accordingly on all these functions. That'll narrow down the scope quite a bit. Lets then quickly run through what we've done so far:

--- We understood how to navigate through code
--- We commented functions we didn't know anything about at the moment
--- We stepped over all those functions but soon realized that this way though exhaustive, is extremely time taking
--- We re-visited our dynamic analysis learnings and identified functions that could definitely be used and set breakpoints on them
--- We started Wireshark so we could see what traffic is sent by the executable at every step

Not bad at all. Lets move on. So keep hitting F8 till you pass 00411E90. I'm saying this because I've analysed it till there and am quite sure that none of those directly affect the malware in any way..look at the comments I've made. If you want though, feel free to F7 into each of those until you are satisfied :). Well now what? Lets try and run the program directly and hit F9. At some stage, though we don't know when.. we must break at the 'connect' breakpoint we have set. So hit F9. We do break as predicted...but even before that we see a new window open up :). Now there's a big chance that this relates in some way to the malware..so we want to find out how that window appeared.

A good way and probably the most intutive way when you're starting off is to just keep hitting F8 till you see the window pop up. Yes, there probably are more intelligent ways to solve this problem but it'll do for now. So lets do just that...hit Ctrl+F2 and restart the program. Now you know that there is nothing till 004011E90 for sure so instead of hitting F8 till there, lets right click - Go to Expression - 4011E90 - OK and jump there. Once you're there hit F4..this makes the program 'Run to selection'. You can also scroll down to that location if its not too far. Once you reach 4011E90 start hitting F8 as you don't know where the popup is going to come. You don't have to wait too long :).....

Pause for a moment when you reach 4011EAC and note this location down somewhere. Now hit F8 again. Boom!! There's your popup. What does this mean? It just means that there was something INSIDE the function that was called at 4011EAC which caused a popup to appear. This means that the function CALL 00419085 is interesting and we need to know something more about it. So we set a breakpoint here by highlighting that line and pressing F2. Now lets Ctrl+F2 again and hit F9 this time.. this effectively tells the program to run till it breaks. It does just that and halts at 4011EAC. Now since we want to know more about this CALL we hit F7 and not F8. We immediately are taken to 00419085. Notice there isn't any popup yet.. it is some place inside this function which does it. We need to F8 till there to find this out. Repeat this process and you see the popup again at the address 40BBA5. Can you see something at 40BBA5 that makes the popup appear? No, its another call. Put a breakpoint here and restart the program and reach 40BBA5. Now step into the call(F7) at this address (CALL 0040B48F). Once in this call start hitting F8 again till you reach the address 0040B4F7. Pause a bit and look at the instruction here -- 'CreateThread'. Another system function...lets look at what MSDN says.

MSDN:
CreateThread - The CreateThread function creates a new thread for a process.

So we're starting something here..mostly this thread causes the popup to appear...the third argument to this function is the address of the code this thread must execute. That argument is defined at the address 0040B4E7 by the instruction PUSH 0040D1AA. So this thread creates whatever there is at 40D1AA. Lets see what there is at 40D1AA. Right click - Go to Expression - 40D1AA. The range is from 40D1AA to 40D28C(RETN function specifies the end of the function). Its this function which is creating the popup. Lets put a breakpoint at 40D1AA and see if the thread jumps here. So hit F2 while at 40D1AA and then Ctrl+F2 again. Arrive till 0040B4F7 and F8 over the CreateThread function...immediately you see the code jumps to 40D1AA and stops. Yes!! Our understanding was correct. Lets F8 step by step now..

You pass over 2 system functions here - Ole32.coinitialize and Kernel32.GetModuleHandleA. I wont explain these here..you can get into the habit of having Google permanently open for MSDN ;). However there is another call here - CALL 404A22 here..at address 40D1E5. Lets F7 into that..and you see its another function which ranges from 404A22 to 404ABA. Just browse through it...anything interesting?? Aha..there is a call to the CreateWindowEx function with its 2nd argument as "IEEmbedded".... very interesting. Remember we found strings called IEEmbedded in dynamic analysis?? Read up a little about this and you will find that this function creates a window of a specific size :). After a few more calls we're back in the previous function at address 40D1EA.

Go on reading. Now there's a ShowWindow call with 2 arguments - the first argument is the handle returned by CreateWindow and the second argument is the number 5. MSDN says that 5 stands for display the Window that was created. Right..step over ShowWindow. Yes!! The window appears. More F8 reveals navigation inside a loop consisting of the functions TranslateMessage, DispatchMessage and GetMessage. We dont want to remain in this loop now...we sort of know what it does..it does things with the window. That's good enough. Lets go back to the previous function and put a breakpoint at 40B511 .. any location after the CreateThread will do - we just want to get out of that thread now that we know what it does. Remove all the breakpoints except that at 40B511 and hit F9. You should get a Window popup and your code should halt at 40B511. Got it?

So effectively to dig out all the information about a particular call we might have to dig in extremely deep into the code. You saw...that to just get to the function which created a window we had to go 4 or 5 calls deep into the code. Its the same methodology we have to follow for every single call that we're interested in. So to sum up what we have learnt so far:
--- Comment code a lot
--- Step Over calls you dont have use for
--- Think of the actual behavior of the program wrt dynamic analysis and break on specific functions
--- Look at the runtime behavior of the program and dig into CALL statements accordingly
--- Understand API's better and set breakpoints accordingly

These are the basics of reverse engineering ..really. Keep digging till you find what you want. In Part 3 we'll use these same concepts and move much faster and conclude our analysis of this piece of malware.

Friday, July 29, 2011

12.1 - Example - Static Malware Analysis

We looked at the behavior of a piece of malware last time and tried to obtain as much information as possible from it by simply running it and watching it interact with various systems. Many times you may not have the liberty to do this and will have to look at only the assembly listing of the malware and deduce what you think it will do. So in this blog - we will look at the same exe (aolsbm.1.exe) and analyze it statically. Lets go.

In case you missed it you can download the malware from http://www.offensivecomputing.net. You will need to register here (free) and then search for the hash 5a2be07ad750bed86be65954fb9d7d21

We need a debugger to step through the code bit by bit and understand what is happening. To do so we'll primarily use OllyDBG. However to get a better view of function calls and loops its a good idea to also open up the same binary in IDAPro (free version is fine) at the same time - the display is much nicer there. Before starting do familiarize yourself with OllyDbg as much as you can. There is no way you'll be comfortable right away and it might take a week of playing with it regularly for you to understand what all the terminology means..but hey..that's just fine. Just try and understand everything before you go forward, don't get frustrated if you get stuck in the middle of all that assembly code. Just keep plugging at it and you'll eventually get it. Enough sermons then..lets go :)

Load up aolsbm.1.exe in Olly using File - Open and do the same in IDA (use the default options). You immediately get a message about Olly's analysis not being accurate and whether you want to continue doing it. This is because it is difficult to analyze a packed executable..remember we talked about this last time? So we have to try and see if we can unpack it using some software. Remember you'd accumulated disk and memory strings from the running process when you ran it? Have a look at the first few lines of either file. Do you see something like UPX0, UPX1 over there? This may..just may mean that a program called UPX was used to pack the executable. And luckily for us, UPX also has an unpacking switch. So lets download UPX (free) and try to unpack the executable using the command - upx.exe -d aolsbm.1.exe. Immediately you get a new line mentioning the percentage to which it was packed and other information about the file. Close Olly and open the file again. No message..rt? And the analysis also was done by Olly..successfully. Remember though that we got lucky this time. Many malware writers (I've read) have their own custom packers and unpackers embedded in the malware itself. So its harder to find out how the malware was packed..and even harder to unpack it. Lets go on.

The entry point or the place in memory where the malware was loaded is an address 00411F04. This is where the malware will start every time it is loaded into Olly. Now .. how do you proceed? There's a huge ton of code to look at..rt? The ground rules for reversing are actually very simple:-

a) Ignore what you do not want to analyze in depth = Step Over = F8
b) Dive into what you want to understand better = Step Into = F7

Effectively the assembly code listing that you see in front of you in Olly is a big list of functions [user defined and system] calling each other in a defined sequence. To understand what the malware is doing, you will need to understand in depth, what some of those functions are doing. Yes, for a complete code reconstruction you would want to understand what each and every bit of code does..but trust me - that is extremely painful, not needed in a very large majority of cases and would take an unbelievably large amount of time. So I am not going to, at this early stage try to understand every bit - I'll try and understand just about enough to tell me what the malware is doing. Moving on then..

We talked about 'Step Into' and 'Step Over' earlier. Now whenever you see a 'CALL' in assembly it means a function is being called..for some purpose. If it is a system function which was exported by some system DLL you do not need to Step Into it. This is because the behavior of those functions is never going to change and there is nothing to be gained by studying them in depth. You can just look at the documentation of those functions on MSDN and find out what parameters it takes as arguments and what values it returns. Lets take an example now - The very first line is CALL aolsbm_1.40194ac .. now this is a user defined function so you may want to Step Into this and find out what it does. For now though just press F8 till you reach the address 00411DA7 where you see another CALL function; this time it is CALL Kernel32.GetStartupInfoW. This is clearly a system function (starting with a name other than aolsbm_1) so you do NOT need to Step Into this function at all. That's because the behavior of GetStartupInfoW is known and it will always get the same inputs and give the same outputs - there IS nothing to study here. So focus only on the User Defined functions.

Now even in the 'User Defined' functions group - you do NOT need to analyze in depth every single function. Relieved? ;). The trick though is knowing which ones to Step Into and which ones to just Step Over. For e.g You'll remember we kept hitting F8 till we got to the Kernel32 function. This meant that we were not interested in any of the CALL functions that were made till the Kernel32 function. So in this case we are saying - I am not interested in 2 CALL functions made; namely -

00411F04 ----- CALL aolsbm_1.0040194ac
00411D9E ----- CALL aolsbm_1.00412880

This assumption that we have made may or may not be correct. Instead of Stepping Over the functions, lets step into these 2 calls. So hit Ctrl+F2 and get back to the start of the program(Hit Yes if you get a warning). Hit F7 on the first line - which will take you to the address 0040194ac (The destination of the call). Now study this code line by line and see if you can see any system functions being called (like the Kernel32 function) in the body of THIS function. The body of this function ranges from 0040194ac to 004019546. Now in this body we can see 5 system functions - GetSystemTimeAsFilename, GetCurrentProcessID, GetCurrentThreadID, GetTickCount and QueryPerformanceCounter. Go on to MSDN and study what each of these 5 functions does. Once you're through you'll understand that this function(0040194ac) is not doing anything that is important from a malware analysis perspective. So we can Step Over it.

Lets repeat this for the 2nd call(00412880). Hit Ctrl+F2 again and restart the program. This time we do not need to Step Into(F7) the first function (we already did that..rt?) .. so we do F8 till we reach the CALL 00412880 statement and then Step Into that call(F7). The range of this call is from 00412880 to 004128c4. Now here we don't have any system functions to give us any hints about what this function possibly does. So unless we're magicians or super gods in assembly programming we really don't know. So simply mark a comment there and skip it. Huh? Yes..that might sound strange but to be frank I don't think there is anything better you can do at such an early stage. Later in the program when you see some function which looks more familiar, you can return and revisit this function if needed. As of now there is nothing to do - so ignore it. One thing though - You'll see that this function has been called by numerous other functions. You can find this out by clicking on the line which has the address 00412880 and looking at the middle pane on the left half of your screen. It will say something like:
------
Local calls from 0040F41C, 0040F58F, 0040F731, 0040F813, 004100E6, 00410938, 00410D9D, 00410E77, 00411D9E, 00411F74, 00412246, 00412415, 00413F31, 004150C2, 004151E3, 00415325, 004154C1, 00415640, 00415D42, 00415E89, 00416133, 0041617F, 004168A3, ...
------
So many calls means its some very common function - otherwise it wouldnt be called so many times..rt? So we can just record all that information and move on. I recommend you just go to the end of that function using Ctrl+F9 as soon as you realize there is nothing useful for you there at that particular moment. This will take you to the last statement of that function.Hit F8 again and you're back at the original CALL. Move up ..Comment the CALL and move forward. The comments are very useful - its very easy to forget what you were doing when you're in the middle of such relatively unreadable code :)

All ok so far? Lets take a break - assimilate all that slowly - and come back for Part 2 of this little exercise in a while. I also recommend you use this oppurtunity to get familiar with Olly and its features .. play around with it till you feel comfortable. In Part 2 we'll use these basics and a few other small tips that I have learnt so far and try and go forward a little quickly. Bye for now.

Sunday, July 17, 2011

10 - OllyDbg hints

Before moving forward into actually reversing something, its a good idea to quickly step through some basic debugging terminology, which you will hear all the time. We'll use OllyDBG as a reference; download both OllyDBG 1.10 and OllyDBG 2.0. There are some features that are available only in the older version. I'll mention the older version whenever needed; otherwise just assume I'm talking about the newer Olly.

1. You can either 'Open' a new executable or 'Attach' to a running executable.

2. Midway through a debugging session, you will want to return to the start many times to understand things better. Use Ctrl+F2 to restart the session.

3. After opening an executable in Olly you will want to run it. Use F9 to run it

4. To analyze the executable you will almost certainly want to break execution midway so you can study it part by part. Move cursor to highlight that line and hit F2 to set a breakpoint. Once you set a breakpoint and then hit F9, your program will run till the breakpoint and wait for user input. If you hit F9 again it will start running again.

5.Programs have functions that are called from within the main program. These functions might be user defined functions or actual system calls. At any time you might want to see how the program is behaving, at ay point in any function. If you want to explore the behavior of a specific function, you must "Step Into" the function using the F7 key. If you know what the function is doing and are interested only in what happens AFTER the function is called, you must use the F8 key to "Step Over" the function.

6. Once you've got to a certain point in the program and want to see where you came from; you can use the '-' key to move the cursor backward and the '+' key to move it forward. This won't actually "re-run" the program; it'll help you understand where you jumped from or the exact path through the code that your program has chosen.

7. If the program that you are debugging is a command line program and needs arguments to even start; you can use the File - Set new arguments menu option to supply those

8. If you want to search for Strings you can right click in the main window and go to "Search for" - All referenced strings

9. To go to a particular memory address you can right click and select Go to -> Expression and type the exact memory address you want to visit

10. You can right click on most values and choose to "Follow in Dump" (bottom left) to understand their content

11. You can hit Ctrl+N to find out all the functions useed by the program and Ctrl+M to find specific sections in memory

12. You can Pause a program by hitting F12

There are plenty of other options as well and they're all explorable yourself or by reading documentation for Olly which is available online. However if you are fully clear about at least all of these options, you should be good to go. We'll explore a few other options as we go along reversing various types of executables along the way.

Next time we will look at a malicious executable and dynamically as well as statically analyze the same. Until then..have fun :)

9 - Static Malware Analysis

We briefly looked at static analysis in Parts 3,4 & 5 and that's what you should read first if you want an introduction. However, that was with Linux as the base OS and the programs that we chose were just to give us a feel about how high level code (mainly control structures) would look in a debugger. In this series we're trying to study malware; which primarily affects Windows systems (for whatever reason ;)) and understand all its functionality - even those bits which aren't necessarily exposed when we initially run it. So for e.g there might be some malware which performs keylogging in the background; but wouldn't necessarily have a button called "Start Keylogger after 5 minutes".. so dynamic analysis might might miss it. But proper static analysis won't.. it'll take longer to identify stuff - of course but you will eventually find it.

Now we're almost always only going to have a single executable to analyze. We will almost certainly NOT have any source code available. If you did, the process of reverse engineering would be much simpler; you would only have to study the syntax for that particular language carefully and you would reach a conclusion about what that malware did, much much quicker. Since we have only the EXE and no code - we'll have to try and reconstruct code in some way and then draww our conclusions.

That's where a disassembler comes in. There's plenty of stuff available about disassemblers online; you can read up on them in detail if you're interested. However for the purpose of this article; its enough to mention that a DA just takes the EXE; opens it; and converts it into assembly language so its easier for you to read. After all its easier for you to understand an assembly language instruction rather than a lot of alphanumeric data joined together :). Have a look at an example: What would you say if you saw something like "EB 27" and told that it had some meaning? What would you then say if you were told there were 1,00,000 such lines and you needed to find out the meaning of each of those and find out what the malware did at the end of all of it? Not too much fun if you ask me. What a disassembler does is simply convert "EB 27" into an assembly instruction which will say "JMP 27 bits forward" from where you currently are. It'll do it for all 1,00,000 lines. So you need to look at assembly instructions instead of plain alphanumeric gibberish. Clearer? But yes, its still 1,00,000 lines. If you think its no big deal, just wait till you start reversing :)

So once an EXE is loaded into a disassembler and the assembly code is completely generated its time to start trying to understand it. "Understand" in malware analysis means - 'Its time to DEBUG it'. Remember gdb in Linux? For Windows we're going to use a tool called OllyDBG which will DA and DBG the EXE. Using a debugger you can control exactly how much the malware runs. You could for e.g say 'Run only the first 10 lines of assembly code. I'll study what changes were made and then you can continue.' We're doing this, to understand exactly how the program works. Makes sense? Excellent :)

So to sum up, if we want to improve at static reverse engineering, we need to understand assembly language (at least enough so we can read code), learn how DA's and DBG's work and understand how to use our tools in the best possible way. As we move on it will become necessary to understand more and more about Windows functions, and when and why they are called. MSDN will quickly become your friend :). This is what I will recommend you read; in the order that I've mentioned:

a) Malware ForensicsInvestigating and Analyzing Malicious Code
This is very basic and focuses primarily on Dynamic Analysis. So if you're clear about Dynamic Analysis and understand systems reasonably well at a high level you can skip this. If not, spend some time browsing through this.

b) Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code
This is an excellent book but is a little too advanced and wont help you immediately. Reading it after around 6 months or so in detail will help. However its a good idea to read Chapters 1,6 and 7 from this book. Again the focus is on dynamic analysis at this point; but spend a day or 2 on just these chapters.

c) Reversing: Secrets of Reverse Engineering
This book by Eilad Eldam is probably a must read for anyone starting off in Static reversing. Yes there are plenty of tutorials and plenty of videos and plenty of more advanced books available..but for me this was the book which helped me through those initial days when I was struggling to learn. It gives you just about enough.. just the right amount that you need to know about reversing before you move forward. Again, there are places you will get stuck even in this book but as long as you understand about 50% of it and learn how to use Olly better by the time you're through - you're good to go. Don't read beyond Chapter 6 when you're starting out.. that's all you need. Chapter 6 was for me; the most helpful chapter.. I could work my way through it very slowly and gain confidence.

d) The IDA PRO book
Eventually at some point of time in your reversing learnings, you will start using IDA PRO. There's a book completely dedicated to the same. It is 'rumoredly' the best disassembler in the market and if you get better at it you will want to use IDA at some stage. That said, its better to understand GDB, Olly and other slightly simpler tools..get your basics clear and then understand IDA. Don't START your career with IDA.

e) Windows Internals - 5th edition
The book contains everything about Windows. The better you are at understanding Windows internals the better you will get at analyzing malware. However, this is NOT a newbie book. I struggled for over 2 weeks reading its content, reading a course by F-Secure based on this book, reading various presentations and PDFs which had this as a reference...but it was just far far too complex for me. I was very frustrated at my lack of progress :(. That's not to say the book is bad, of course it is not but for someone who is starting out... I would not recommend it. Maybe a year or 2 later, when you're much clearer about the basics.. it will be a good reference.

Well... that is it for an introduction to Static reversing. I recommend you do a little bit of reading [from any of these books or even better sources] before proceeding forward. Its not compulsory but a few basics are always helpful :)

In the next article I will try and explain the key features of OllyDBG - just the ones that you need to know before you start out. There is documentation available of course, but a gentle introduction to "just what you need" will help you for sure.

Happy reading :)