Sunday, April 18, 2010

Reverse Engineering - 4

We looked at the assembly version of a very simple program in the last post. Hopefully you understood most of it. Over the next 2 posts we'll take up more examples to reinforce these basics, because they'll be used all along.. all the time. We now pick up the example called functions.c from Chapter 6 but strip it a little so just 1 function is used. We'll try and understand how a single function looks on stack and then look at multiple functions. I'm using the following gcc compiler - so if you want to follow this step by step try and get the exact same version:

gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)

The reason I mention this is purely because multiple versions of code are on the Chapter 6 page; meaning that different gcc versions with different switches generate slightly different assembly. While all that is important no doubt, it isn't right now when we're taking small steps towards understanding the basics. Lets go on..Here's the edited function code that I'm using:
---------------------------------------------------------
1 #include
2
3 void function3args(int a, int b , int c)
4 {
5 printf("%d %d %d\n" , a , b , c);
6 }
7
8 int main(int argc, char **argv)
9 {
10 int a;
11 int *ptr;
12 function3args(1,2,3);
13 }

---------------------------------------------------------
Like last time , lets compile it with gdb support and open up the disassembly in gdb. Oh and you have that pen and paper with those columns too..rt? ;)
[arvind@dilby ~]$ gcc -ofunc1 -ggdb functions.c
[arvind@dilby ~]$ gdb -q func1
Using host libthread_db library "/lib/libthread_db.so.1".
0x080483ed : lea 0x4(%esp),%ecx
0x080483f1 : and $0xfffffff0,%esp
0x080483f4 : pushl 0xfffffffc(%ecx)
0x080483f7 : push %ebp
0x080483f8 : mov %esp,%ebp
0x080483fa : push %ecx

Here's the status of esp for the first 6 instructions:

lea 0x4(%esp),%ecx -- No change in esp
and $0xfffffff0,%esp -- Logical and changes esp to bfc49ec0
Then there are 3 push instructions which decrease the value of the stack by 12 . So after the first 6 instructions the value of ESP is bfc49eb4 ( bfc49ec0 - 12). Just before the last push ESP is saved into EBP. This value in ebp will not change at all till it is popped and the function main ends. You can check the value of esp and ebp after each instruction by using x/xw $esp and x/xw $ebp . To advance instructions type nexti.

Then there is a sub $0x24,%esp which is to allocate space for local variables on the stack. Why 0x24? Lets look at the code in main().
0x080483fb : sub $0x24,%esp

The 3 arguments are then pushed on to the stack . Note that the arguments are passed on to the stack in reverse.
0x080483fe : movl $0x3,0x8(%esp)
0x08048406 : movl $0x2,0x4(%esp)
0x0804840e : movl $0x1,(%esp)

Note down the values for esp and ebp carefully just before executing this instruction.
0x08048415 : call 0x80483c4

Now get the disassembly for the function - function3args and lets see what happens there:
0x080483c4 : push %ebp
0x080483c5 : mov %esp,%ebp

Notice that the stored value of ebp which had remained constant during the lifetime of main is pushed on to the stack? And the current stack pointer made the current value of ebp? If there's another function after this, ebp will be pushed on to the stack again and so on. Once the last function completes the ebp's of each function are popped off till you reach the ebp of main at which point the program exits.

0x080483c7 : sub $0x18,%esp
Values for variables are allocated on the stack for the function function3args.

0x080483ca : mov 0x10(%ebp),%eax
0x080483cd : mov %eax,0xc(%esp)
0x080483d1 : mov 0xc(%ebp),%eax
0x080483d4 : mov %eax,0x8(%esp)
0x080483d8 : mov 0x8(%ebp),%eax
0x080483db : mov %eax,0x4(%esp)
Move the arguments of the function on to the stack.

0x080483df : movl $0x8048500,(%esp)
0x080483e6 : call 0x80482dc
Call the printf function with the arguments.

0x080483eb : leave
If you look at the value of ebp just after this instruction , you'd see its value change back to its earlier value which means this function has exited.

0x080483ec : ret
Exit from function3args

0x0804841a : add $0x24,%esp
0x0804841d : pop %ecx
0x0804841e : pop %ebp
0x0804841f : lea 0xfffffffc(%ecx),%esp
0x08048422 : ret
Exit from main.

Hope that clarified things a little better. Next post we won't go so much into detail, we'll make a couple of assumptions based on the previous 2 posts and learn a little more. Have fun :)

Sunday, April 4, 2010

Reverse Engineering - 3

Once you've got a fair idea on how to perform dynamic analysis its a good idea to try and start understanding how the exe/binary in question was actually built in the first place. Popularly that is what is called static analysis. The reason why we did dynamic analysis first is that it helps in you getting a great high level view of what the binary actually does and what its "real purpose" is. Plus its simpler to do..many a time you'd be able to get what you wanted by simply running an exe in a contained environment and studying its behavior. Other times you wont and you need to understand more about the binary. In such cases static analysis or reversing the binary will definitely help.

Reversing = Reading the code in which a binary was written. In some languages you can get the code back very easily for eg. Java .. where there are tools you can just feed the binary to..and get back the code. In others it isn't so easy. For eg. The code for an EXE written in C can't be got back that easily. Hence its good to learn the art of using disassemblers and debuggers and attempt to understand the assembly version of the binary.

NOTE: Don't look too much at Chapters 3,4 and 5. They're slightly out of flow and we'll refer back to them when needed. For now stick with me :D

Jump over to Chapter 6 now and read only the first 2 sections there and then we look at examples and learn. A concept IMO is best learnt through examples; instead of sitting and reading a 1000 page manual. The manual is best referred to when you feel the need to brush up on your concepts; specially true of Rev Engg where there are so many concepts all merged into one. Lets look at a little HelloWorld in Assembly. Note that I'm going really slow deliberately so we understand everything that takes place when a binary is run.

#include
int main(){
printf("hello");
}

We compile the code and run it and get the desired output:
gcc -o hello -g gdb hello.c
./hello

hello

Now lets see what happened underneath.
[arvind@dilby ~]$ gdb -q hello
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x080483c4 : lea 0x4(%esp),%ecx
0x080483c8 : and $0xfffffff0,%esp
0x080483cb : pushl 0xfffffffc(%ecx)
0x080483ce : push %ebp
0x080483cf : mov %esp,%ebp
0x080483d1 : push %ecx
0x080483d2 : sub $0x4,%esp
0x080483d5 : movl $0x80484c0,(%esp)
0x080483dc : call 0x80482dc
0x080483e1 : add $0x4,%esp
0x080483e4 : pop %ecx
0x080483e5 : pop %ebp
0x080483e6 : lea 0xfffffffc(%ecx),%esp
0x080483e9 : ret
End of assembler dump.

We compiled the code to give the binary a name "hello" and also passed the code to the gdb debugger using the -g option so we can view the code inside gdb when we want to. We then start gdb with the -q(quiet) option and get the assembly version of Hello World. Clear enough? Now lets understand the assembly.

The number
0x080483c4 is the address in memory where main starts. The second column is the offset from the starting address in hex. The third column is the actual instruction itself.

Now if you read the first 2 sections of Chapter 6 carefully you'd know that there is something called the function prolog that is mentioned there. Effectively saying, before any function actually starts to happen - the stack MUST do something with it. What it does is save the current value of ebp so it can be reused sometime. it then checks how many local variables are declared inside the function, main() in this case, calculating how much space is needed for each of them and then pre-allocating that amount on the stack by decreasing the value of esp(the stack pointer). The value of esp will change all the time, while ebp WILL stay constant throughout.

Quoting now a few golden lines from Chapter 6:
Reading Assembly - Keep track of the stack and registers --- The secret to understanding assembly code is to always work with a sheet of paper and a pencil. When you first sit down, draw out a table for all 6 registers A, B, C, D, SI, and DI. Keep track of the high and low portions as well. Each new line of this table should represent a modification of a register, so the last value in each register column is the current value of that register.Next, draw out a long column for the stack, and leave space on the sides to place the BP and SP registers as they move down. Be sure to write all values into the stack as they are placed there, including ret and the stored BP. If you're just starting off with rev engg like me I'm quite sure this is still confusing to you. No problem - it'll get cleared up as we go along.

Now back to the program, we want to see what happens at every single point so we put a breakpoint inside gdb on the first line itself as follows.
(gdb) br 1
Breakpoint 2 at 0x80483c4: file hello.c, line 1.
(gdb)

Now we run the program and it breaks as expected immediately:
(gdb) r
Starting program: /home/arvind/hello
Breakpoint 2, main () at hello.c:2
2 int main(){
(gdb)

Now the first instruction in the assembly code is ----- lea 0x4(%esp),%ecx . This means load the address at esp+4 into the ecx register. Immediately now we go to our pen and paper and look at the values for both esp and ecx and write them down.
(gdb) x/xw $esp+4
0xbfd3ffd0: 0x00000001
(gdb) x/xw $ecx
0xa2bffcc4: Cannot access memory at address 0xa2bffcc4

When we want to look at the value at an address/register we use the $ in front of it. Now if you notice , the value at esp+4 is 1 but ecx doesn't show the same value as expected. That's because we had a breakpoint at the very start of the program and the first instruction never executed. So lets execute that and check ecx again.

(gdb) nexti
0x080483c8 2 int main(){
(gdb) x/xw $ecx
0xbfd3ffd0: 0x00000001
(gdb)

Bingo! ecx now has 1 . The gdb command nexti just says ; I've executed 1 instruction , now the next instruction is at 0x080483c8 . Clear enough? Lets go on to the next instruction now. Oh wait..did you write that down the values for esp and ecx on paper? ;) . Lets do that for a while till we're really clear on what we're doing. Again:
(gdb) x/xw $esp
0xbfd3ffcc: 0x004d6390
(gdb) x/xw $ecx
0xbfd3ffd0: 0x00000001

The current value of esp is bfd3ffcc and that of ecx is bfd3ffd0 (because the address of esp+4 was loaded into it). You can use a hex calculator to cross check the hex as well ; side by side. esp +4 = bfd3ffcc+4 = bfd3ffd0. So all is well. Moving on then..we run nexti to execute the current instruction ...which is

and $0xfffffff0,%esp and translates to and fffffff0, 0xbfd3ffcc which comes out to bfd3ffc0 . Note the $ just before 0xfffffff0 ? That says its a value and not an address. A inary and brings the result of bfd3ffc0 which is then stored in esp. Moving on..

(gdb) nexti
0x080483cb 2 int main(){
(gdb) x/xw $esp
0xbfd3ffc0: 0x004bcca0

The next instruction allocates space on the stack for some local variable -- pushl 0xfffffffc(%ecx). It doesn't change ecx or anything, it just allocates space for the future. pushl takes 4 bytes so you should now see esp go down by 4....

(gdb) x/xw $esp
0xbfd3ffbc: 0x004d6390
(gdb)

Yep.. Its bfd3ffbc .. 4 less than its previous address bfd3ffc0. Similarly the next instruction is push %ebp makes space for another 4 bytes taking the value to 0xbfd3ffb8. Do you have 4 values for esp on your paper now??

(gdb) nexti
0x080483cf 2 int main(){
(gdb) x/xw $esp
0xbfd3ffb8: 0xbfd40028

The next is a mov instruction - mov %esp,%ebp ... which moves the address from esp into ebp(the base pointer) ; so after this executes the values for esp and ebp must be the same. Lets see..
(gdb) nexti
0x080483d1 2 int main(){
(gdb) x/xw $esp
0xbfd3ffb8: 0xbfd40028
(gdb) x/xw $ebp
0xbfd3ffb8: 0xbfd40028
(gdb)

Yep.. no problem. All normal. The next is another push which will bring down esp's value..note that ebp doesn't change. ..
(gdb) nexti
0x080483d2 2 int main(){
(gdb) x/xw $esp
0xbfd3ffb4: 0xbfd3ffd0
(gdb) x/xw $ebp
0xbfd3ffb8: 0xbfd40028

Then there's a sub instruction which further decreases esp by 4 -- sub $0x4,%esp ..to bfd3ffb0
(gdb) nexti
Breakpoint 1, main () at hello.c:3
3 printf("hello");
(gdb) x/xw $esp
0xbfd3ffb0: 0x004af940

Hmm..notice that the next instruction is a printf?? Lets now look at our code..inside gdb..
(gdb) list
1 #include
2 int main(){
3 printf("hello");
4 }
Yes...our code is only now..STARTING TO EXECUTE... notice all those little things that happen in the background..before even your first line of code executes? Very exciting to know..for me anyway :) . Well lets go on...you're still writing down..right? Moving on then.. the next inst is another move..

movl $0x80484c0,(%esp)
The $ signifies that the value 0x80484c0 is put into esp. NOT the value at the address 0x80484c0 . Important that we understand this..A look at the stack confirms this..
(gdb) x/xw 0x80484c0
0x80484c0 <__dso_handle+4>: 0x6c6c6568
(gdb) x/xw $esp
0xbfd3ffb0: 0x080484c0

Now...if you look at ur disassembly you'll see that the next instruction is a call to the actual printf function , whenever you see this you need to remember that you dont need to step through all the instructions in the printf call itself..just stick to your own program. This'll get clearer when we look at some little code with functions in it.

printf has executed successfully and returned to our code, the next inst being an add instruction and then two pop instructions.. which means esp is incremented by 4 three times...keep hitting nexti till the next instruction shows up as 0x080483e6 and then lets check the value if esp...is should be .. bfd3ffb0 + 12(decimal) =bfd3ffb0 + C(hex) = bfd3ffbc...

(gdb) nexti
0x080483e4 4 }
(gdb) nexti
0x080483e5 4 }
(gdb) nexti
0x080483e6 in main () at hello.c:4
4 }
(gdb) x/xw $esp
0xbfd3ffbc: 0x004d6390
(gdb)

Great. Notice we're going up and reaching the earlier addresses again? Thats a sign that the program is completing. Eventually we should reach ebp again. The next is .. lea 0xfffffffc(%ecx),%esp which causes the stack to go up again by 4 and reach bfd3ffc0..and also load a new address into esp...just where it left off..

(gdb) nexti
0x080483e9 4 }
(gdb) x/xw $esp
0xbfd3ffcc: 0x004d6390

We then close off with a ret...and the program then exits after printing the hello which is what it was supposed to do. Note that the value of ebp never changed, all the addresses were referred to wrt esp. There will still be a few questions in your minds I guess.. hopefully the future posts where we take a look at more programs from Chapter 6 will clear those up. Hope you enjoyed and understood this very basic introduction to assembly :)