Wednesday, January 27, 2010

Reverse Engineering - 2

The last post , we just felt around a little bit . The main things we understood were:

--Dynamic Runtime Program Analysis
--What Rev Engg was
--Compiling a program and the steps involved

Effectively, when you compile a program , you convert the code you wrote into a form which you can use to do something you couldn't do manually or which would have taken far too much time. When reversing you only have the final form ; the final binary/executable and need to find out exactly what it did. Assuming that you already did the dynamic analysis that Lenny Zeltser discussed ; the next step is to find out as much as you can about the program , the environment it is running in and the other components that make it run. During the first few blog posts I will be referring only to Linux as I'm far more comfortable with it than Windows.

When a Linux binary is run , it becomes a process which consumes resources on the host. While doing so it receives something called a PID(Process ID). The details about the various resources that the binary consumes are stored in the /proc folder on Linux. Lets look at one process entry for a running process ; say sshd (The SSH Daemon). Here is what a ps aux listing for ssh gives:
root 1501 0.0 0.2 6064 1080 ? Ss Jan25 0:06 /usr/sbin/sshd

The number 1501 will be a directory in /proc . Inside /proc/1501 will be all the resources that sshd consumes.
cmdline: Contains the command that started the process, with all its parameters. If its malware that's running this is a good place where you can get all the options the malware was started with.
[root@dilby 1501]# more cmdline

environ: Shows all environment variables for the process and all its child processes.
[root@dilby 1501]# more environ
The environment variables aren't really separated clearly; here the environment variables are:
SELINUX_INIT and CONSOLE . YES and /dev/console are its values. These can be clearly listed as follows:
[root@dilby 1501]# cat /proc/1501/environ | tr '\0' '\n'

File descriptors for input , output and error for each process. In case a process is redirecting output somewhere , you know where. Here's a sample listing for the 1501 process. 0(input) , 1(output) and 2(error) are all redirecting to /dev/null (black hole) means this is a daemon. Its also making some network call as can be seen by 3(socket:some number)
lrwx------. 1 root root 64 2010-01-29 15:12 0 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 1 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 2 -> /dev/null
lrwx------. 1 root root 64 2010-01-29 15:12 3 -> socket:[5715]

If you want to confirm that 5715 is something(socket) that actually does belong to SSH you can run netstat as follows.
[root@dilby ~]# netstat -ae | grep -v -i unix
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode
tcp 0 0 *:ssh *:* LISTEN root 5715

Ah its an inode. SSH is using an inode for its socket communication. Anytime hence you want to find out if a process is doing something over the network; look for socket fd's in here.

: Deals with the memory in use by the process and addressable areas by the process and its dependencies. This will not make much sense just now, when we get to actually looking at ASM it'll help.

: Provide information about the status of the process. Here's a sample:
Name: sshd
State: S (sleeping)
Tgid: 1501
Pid: 1501
PPid: 1

Apart from this, there's plenty of other information that you can get in the /proc directory. Discussing it at this point though, won't be too beneficial so I'll skip it.

What type of file is it? Is it a known file format? Does it have any dependencies?
Use file or ldd to find out. Here's an example:
[root@dilby 1501]# file ~arvind/a.out
/home/arvind/a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

ldd , if it gives you a long list is saying - This is a dynamic file and needs these libraries on your system to function properly. Here's an example:
[root@dilby 1501]# ldd ~arvind/a.out => (0x00110000) => /lib/ (0x004c0000)
/lib/ (0x004a1000)

If it were statically compiled (all libraries prepackaged into the binary) then u'd get very different messages.
[root@dilby arvind]# gcc -static a.c
[root@dilby arvind]# file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.9, not stripped
[root@dilby arvind]# ldd a.out
not a dynamic executable

For an initial analysis of the file on disk, that'll do. You probably want to check if the file is communicating with the network around it. The socket inode which we discussed above is one way. Another way is to look at lsof and netstat to look at active connections. Key options for netstat are:
-a - All connections
-n - All entries in numbers
-p - Program using that connection
-e - Includes Inode number on file system used by that connection
-l - Only listening connections
-r - Current routing table
-c - Repeat netstat every . Useful when u want to check new connections.

Running tcpdump or wireshark at runtime also is helpful for viewing greater detail. Here is a great cheat sheet for the same. We'll start dipping our feet into Assembly language next time.

No comments: