Sunday, August 22, 2010

Reverse Engineering - 5

Okay so lets look at a program with a couple of control structures like if, if-else while and for, and see how they look in assembly. We won't go through the prolog again and we'll assume that you've understood the previous 2 posts.

The first bit of code we look at is as follows(its called ifelse.c on that site):

gcc -o ifelse -ggdb -O0 -fno-builtin-printf ifelse.c

Lets step through the assembly. First the prolog and space allocation:
0x08048384 : lea 0x4(%esp),%ecx
0x08048388 : and $0xfffffff0,%esp
0x0804838b : pushl 0xfffffffc(%ecx)
0x0804838e : push %ebp
0x0804838f : mov %esp,%ebp
0x08048391 : push %ecx
0x08048392 : sub $0x14,%esp

Then the (a>0) bit:
0x08048395 : cmpl $0x0,0xfffffff8(%ebp)

Then the bit which has the if-else logic. The instruction jns stands for (Jump if no sign), which effectively means 'positive'. So if the value of a is positive you jump straight to 0x080483a9, else you go through the instructions from 0x08048399 to 0x080483a7.
0x08048399 : jns 0x80483a9
0x0804839b : movl $0x80484a0,(%esp)
0x080483a2 : call 0x8048298
0x080483a7 : jmp 0x80483b5
0x080483a9 : movl $0x80484b4,(%esp)

That leaves just the final printf. The if-else loop terminates just 1 instruction before that. Which means...
0x080483b5 : movl $0x80484d4,(%esp)
0x080483bc : call 0x8048298
0x080483c1 : add $0x14,%esp
0x080483c4 : pop %ecx
0x080483c5 : pop %ebp
0x080483c6 : lea 0xfffffffc(%ecx),%esp
0x080483c9 : ret

..all of this are the bits after the if-else loop. The program prints the 'Leaving main' bit and exits. Simple eh?
Now for an example on while. We use this code.
We compile this as usual - gcc -o while -O0 -fno-builtin-printf -ggdb while.c and then open it up with gdb.

You see the usual prolog which I won't go into. You then see a line allocating space on the stack as follows.
0x08048392 : sub $0x24,%esp

At this point I'd just like to make one small point. These numbers (0x24 in this case) will see that there's just one local variable(int i) in the code. So why should 0x24 be allocated? Well..the only explanation is because of the "%d\n" in the printf statement below. The moment you comment that out and recompile, the line changes to 0x10..meaning I allocate only for int i. Something like this:
0x08048362 : sub $0x10,%esp

Although I'm still not sure why its 0x10. It should be lesser, logically speaking. But maybe that's how gcc deals with stuff. More clarity in later blogs maybe..once I'm clearer ;). Moving on then..

The next line is for the i=0 bit.
0x08048395 : movl $0x0,0xfffffff8(%ebp)

Now the part for the while starts. Note a jmp straight away? You thought there should be a cmpl instruction..didn't you? Me too. Lets see what is happening. Here's all the relevant assembly lines:
0x0804839c : jmp 0x80483b5
0x0804839e : mov 0xfffffff8(%ebp),%eax
0x080483a1 : mov %eax,0x4(%esp)
0x080483a5 : movl $0x80484a0,(%esp)
0x080483ac : call 0x8048298
0x080483b1 : addl $0x1,0xfffffff8(%ebp)
0x080483b5 : cmpl $0x9,0xfffffff8(%ebp)
0x080483b9 : jle 0x804839e
0x080483bb : add $0x24,%esp

There's a jump to
0x80483b5 where it checks if i<=9. This is the same as .. Is i less than 10. The jle stands for Jump if Less Than or Equal to. So if the variable i is indeed less than 10, it goes into the while loop and jumps back up to 839e. The arguments for the printf statement are pushed on to the stack(recompile code without arguments here if you want to understand exactly why 0x080483a1 and 0x080483a5 work). The printf is called on 0x080483ac and i is then incremented by 1(That's the addl instruction). Once this is done, we need to recompare the new value of i; is it still <10? Guess what's the next instruction? You guessed it, its the same instruction where we jumped first..when we thought it should be a cmpl. The rest of the code is the usual stuff. ----------------------------------------------------------------------------------------------------------------------------------------
Now lets look at a for loop. We use this code.

Compile as usual using - gcc -o for -O0 -fno-builtin-printf -ggdb for.c and open it up in gdb. If you remember the code for the while, you'll notice that the assembly for this program is exactly similar to that of the while loop! That's because the for and the while loops are just two different ways of doing the same things. I'm not explaining anything here because all that I said during the while loop is true here as well. Lets look at a do-while now.
We use the following code for a do-while

Now lets have a little bit of fun. Try and think of what a do-while does. How is it different from a while? It guarantees at least one pass of the loop; because the comparison happens after the first run through. So that means, unlike the while loop example where we jumped to a cmpl 0x9 immediately after initializing i to zero, we will call printf and increment i atleast once before comparing. Makes sense? Lets look at an assembly snippet and confirm our thoughts. Here it is:
0x08048395 : movl $0x0,0xfffffff8(%ebp)
0x0804839c : mov 0xfffffff8(%ebp),%eax
0x0804839f : mov %eax,0x4(%esp)
0x080483a3 : movl $0x80484a0,(%esp)
0x080483aa : call 0x8048298 < printf@plt >
0x080483af : addl $0x1,0xfffffff8(%ebp)
0x080483b3 : cmpl $0x9,0xfffffff8(%ebp)
0x080483b7 : jle 0x804839c

Bulls eye! There is a printf at 0x080483aa and a cmpl at 0x080483b3 . Give yourself a pat if you guessed that right :)

We'll end up this basic control structure series right here. Feeling more comfortable now? Good. If not, then just step back and re-read each part. I am sure you will understand eventually.

Before moving on though, a quick note on the gcc arguments. The -O0 stands for .. "Don't use any gcc optimizations". That didn't impact this program as such, but I'll use it going forward; just so gcc does not cause funny problems. The -f stands for "Don't use any arguments when you see printf in my code". Without this, if you run gcc without the -f, the printf gets converted into a "puts" function call, which caused me a lot of pain. A nice blog over here speaks about similar problems.

The immediate next though obviously is - What other gcc optimizations are there? How many of them are relevant? Well, here's a list. I really don't know which ones are this point. I'm learning as much as you, as we go along. As and when I find something relevant, I'll introduce it.

Thought about looking through more sample programs on all the topics over at our parent site. However I then thought that we now know the basics and will be able to step through newer and more complex data structures as and when they come along. We just won't take each and every data structure right now. You can have a look at Chapter 7; it talks about GUI Debuggers and its advantages. I've used the free version of IDA a little..but since I hadn't any clue about any basics - I couldn't understand how to use it well. We'll use the GUI debuggers when we have a genuine need for them; i.e when we look at more complex programs.

The rest of the content online sadly is kinda incomplete so I won't be referring to that any more and will start taking up small vulnerable C programs to understand things better.

Until next time...So long!

No comments: