Thursday, May 26, 2016

Ack, ag... not grep

Grep is a great tool and anyone using any *nix system uses it all the time. In fact you could also use it with Cygwin on Windows. But recently at work, a colleague pointed out that ack and ag are great, much faster tools with better defaults than grep. So while I still go to grep once in a while, ack/ag is my search tool of choice now when doing code-reviews. You can find ack here and ag here. ag is very similar to ack, just even faster (Disclaimer: Only tried this once on my last gig)

While the man page is always a good option, here are some of my favorite switches to get you started.

--java or --ruby: Will limit the search to just files of that type. Meaning it'll ignore a lot of libraries, documentation and other large stuff that happens to be in the same directory

Recursion: Its on by default, which is almost always what you want in a code-review.

--ignore-dir: Its common to find 567 hits of a function in a source-code-tree and realize that 560 were in a directory 'abc' that you didn't want to search at all. You can use this switch to completely ignore that directory.

--ignore-file: Same as above but for a file. If you have a file in 100 different directories that's matching, but you want to ignore, use this.

-l: This one's similar to grep I think, if you don't want "what" was matched, but where it matched it. Just lists which files matched a pattern.

-i: Same as grep. Case insensitive

-v: Invert match. Same as grep. Case insensitive

The responses for ack were nicer too and more intuitive, I thought. Some of my favorite things:

- Listed filename only once and then all the matches under a file. Grep prints the filename_per _match
- Breaks between files by default. Grep dumps them all together.
- The colors of the response were nicer (IMO)

Also lastly, it appears to have the option to use a .ackrc file to set your options. So then you can just type ack and all the options you set in .ackrc will get picked up on their own.

I mean, you can probably do all this with grep as well, so it isn't meant to say.. 'hey no more grepping for me' .. just that it is a really nice tool that can help in some situations.

Tuesday, November 17, 2015

Respect for developers

At work, we tend to spend time that we’re not on projects however we want. Ideally, however meaning - technically however – not being on Facebook 8 hours a day. But jokes apart, usually someone picks up a skill that he/she isn’t familiar with and tries to learn the same when not on projects. Since most of my professional life in information security has been to break things, I’ve always been curious on how things are on the other side – to build things. What challenges does a developer face? Why is there still such a lot of truly awful insecure code still out there? This despite there being tons of resources to learn from? After all, as a hacker that’s how I’ve learnt all my life – don’t know how to hack a new technology? Go read. Learn. Master. Hack. Why should it be any different from developers?
And so, I decided to fix a ton of bugs in Whetstone, our internal appraisal system which our boss wrote while he was on paternity leave. He takes his vacations seriously, as you can see. Whetstone was written in Ruby on Rails – which is one of my favorite languages for web development – just because of how easy it is to get started. So I go, ooh nice – how hard can it be? And that’s where it all began going wrong…
So obviously, I can’t gut the whole thing and start re-writing the entire system. I have to fix bugs that currently exist. I launch GitHub and see 78 existing bugs. So now I have to prioritize which ones to fix first. I start the process. I pick a simple one to start off with and quickly fix it. I’m excited. I want my change to be visible immediately. Oh but wait, that’s not how it goes. Apparently, git has something called branches that I first need to learn about – so as to not mess with other people pushing code back as well. That puts and end to my coding for the present, and I spend 2 hours reading about how Git handles versions and branches and why its better than SVN and so on. Then I push my code and see it appear on Github. But then someone has to “pull” my changes or accept them. And clearly, while I have permission – it makes no sense to approve my own code – that defeats the purpose. And boss is on leave. No push. No warm fuzzy feeling of first ever fix. Process problems.
Anyway, I’m now feeling better and pick up a few moderately complex issues. It seems I know how to fix it. Code. Compile. Fail. WTF. Google. Stack Overflow. Fail. As it turns out, I’ve upgraded Rails to 4.3 or something and all the fixes on the Internet which have up-votes on Stack Overflow all fail. Somehow after a lot of trial and error I find a fix, but it has taken way way longer than I’d ever expect it to take. And note, this is a simple internal application. If I fixed bugs at that speed in production, as a developer – I’d probably be fired in a week. Skill problems.
Then, I get put on a billable project and forget all about this for a bit. Boss comes back, my 4 changes are accepted. Am I having fun, he asks? Yes yes of course I say, drunk with the success of my 4 massive quashed bugs. Okay, have fun – fix some more says boss. So 2 months later, I pick it up again. So let’s start now. Er, why did I write this code this way? F***, it was so long ago – I should have commented. Lets look at some old code, maybe that explains it. Er no, no comments there either. Just some fancy one-liner SQL looking query. Ha, I’ll just comment the old code then and write new fresh code! That’ll fix it. For sure…… 2 hours later. Undo. Undo. Undo. Okay that broke everything :| and clearly I can’t code. I suck.
And now, after all the undoing, something else isn’t working too. Screw it. Let me revert to a clean state. Let me just delete that stupid whetstone_old directory that I created. And I’ll rebuild everything from Git again – from my old state. Delete. Rebuild. Ugh. Read Me isn’t good. How the hell did I build it last time? What does this error even mean? Note here, that I have NOT fixed 1 single bug yet… 1 hour later – AH so all you needed to do was change the config file entry? Okay. Anyway, let me first collect all my old notes from whetstone_old before I forget stuff like this. Don’t want to waste time.
Eh. Where’s whetstone_old?? Oh no!!! Don’t don’t tell me it was in the directory I deleted #-o. Yes. It WAS in the directory I deleted. Woo Hoo. All gone. I’m screwed. Backups are important. There’s a reason you backed up to whetstone_old. Why would you delete it?
Another day wasted then. Okay okay, now lets fix bugs. Ah, here’s an easy bug – “Display last login time for a user”. 10 minutes. I got this. Just print a date out in the view. Adds date to view. But this is just printing the current date each time ffs :-o. We want it for each user. Oh. That means a new column. In the user database. I know a little MySQL though, from all my SQL injections over years gone by, so this should still be quick….
Error. Can’t connect to MySQL database. Eh? Different port? No. 15 minutes. Can’t figure out where the DB is. RTFM. Sqlite. NOT MySQL. Now if I want to debug anything, I need to learn Sqlite querying. Learn how to use a database. NO ffs I have still not fixed the bug. Okay, now I understand sqlite. But how do I add that column? Learn Rails migrations. Wow. Another half an hour. Add column. Finally fix bug. It lunch time. I should stick to hacking things – I clearly am not good at this stuff. But wait, now I got it.. I know how the DB works, so things should now work out...
Okay, lets login and see if our date prints right. Just to check lol. Obviously nothing can go wrong. Hmm, looks okay. Oh wait, we need to do some stuff with the Date. Why do none of the in-built functions work? Oh no, its not a varchar – its DateTime – only then will some functions work. Or I’ll have to write my own functions. Which would be stupid for such a trivial task. So I have to change the data type in the DB. Learn how to change Rails migrations. And while that looked simple, it didn’t work. The accepted solution was to “DROP TABLE” and “Recreate TABLE”. Ugh.
Now remember, I hadn’t created the table in the first *^%$#@! place. So I now have to find out the structure of every single column of that table and re-create it. Screw it. I don’t want a fancy date. I’ll just leave it as it is and go on… But now that failed migration attempt has screwed something else up – and all my normal code that I never never touched is not working. Fantastic. Things you never coded can break despite you never touching it.
Meaning, I now have to learn the entire old schema. Luckily logging was turned on by default, and the old old old log file was never truncated. Extract the old old query. Convert it into a valid Sqlite query. Drop old table. Create new table. Okay error gone. But now no date display. WTF!!! That’s cause recreating the table destroyed my Rails migration. So do that again. Finally done. In short, playing with the DB isn’t particularly fun, and any re-architecting brings up weird weird problems.
And after all this, all I’ve done is fix one lousy feature request that displays the last login time. Okay let’s move on to the next bug. It’s a security issue now, user IDs are generated sequentially and that’s a problem – a hacker could enumerate all valid numbers. So I need to randomize stuff. Meaning I have to change the data type of a column. Right now, the id column is an integer, I need to convert it to a GUID. Looks very similar to the previous DateTime problem haha. I got this one for sure. Migrations. Change. Blah. . Er. Why did something else break again? Why is nothing working now? Why can’t I even login? Panic. Turns out that id was the Primary Key for the DB. And messing around with the primary key is a bad bad idea. Even worse than messing around with a database. Don’t do it. Just just don’t do it. Deny that feature request. Learn how to change a primary key.
This is terrible. Is this what fixing other people’s code is like? And this is for a small internal application which is relatively well written, compared to some of the bloated code that I’ve seen over the years. Anyway, more hours wasted – I figure out what to do. But now, for some reason I can’t find out where the hell to put the ‘randomizing code logic’. A few more hours… oh wtf.. you know by now that I’m spending more time fixing stupid knowledge gaps than actual bugs.
As it turns out, my boss (correctly) decided to not write a line of code for authentication, and relied on a third party Ruby Gem instead. Well whoopee doo doo. #-o Why am I sarcastic, you say. It means that I now have to hack on some third party module code instead. Start learning third party module code now. That’s another fail, and I have to contact the module owners who patiently explain what to do. Eventually it all works.
Lets look at Bug 3 now. All you need to do is to make a really simple UI change. This CANT be hard ffs. How hard can it be? But by now, I’m in a numb state – coz I’m fairly sure something will go wrong. Oh look. Its CSS. And CoffeeScript. Do I know either? NO. Learn CSS. Learn Bootstrap. Learn Coffee Script. The saddest part is I just need enough to fix a tiny bug, but without knowing the basics – I can’t do it. More time gone.
And this pattern repeats.. again and again and again. Eventually of course, as expected (else I won’t have a job) I improve and start fixing things quicker. But something or the other always always breaks – when you least expect it to. And when its not your code, its harder to fix it. And now I think of those massive banking applications, when the chief architect and the lead developer quit a while ago, and there are 5 new developers. Shudder.
And I still have tons of bugs left. I find my mind thinking of shortcuts and dirty ways of solving problems all the time. I find myself thinking of how to somehow get that number of bugs down. Somehow. Quickly. So I can get back to actually writing new cool stuff. It isn’t easy to maintain that mental discipline. Honestly isn’t. And this is for an internal app, which isn’t critical and with no deadlines except those that I set for myself. I have a new found additional respect for developers.
I always knew a developer’s job is trickier than that of a hacker – and I’d like to honestly say I’ve always been respectful to every dev I have met. But I’ve never sympathized with one. I’ve always felt that it is yet another job that one learns to do well over time. And that’s true, sure – but they do work under greater pressures than I do. There’s other skills in my job as a white-hat hacker, other traits that maybe developers don’t need to learn – but whatever a developer does need to do – isn’t easy. IF IF you want to do it well.
Lastly, I want to tell every white-hat hacker to put on the defensive hat once. Actually code. Code things that people will look at and break and tell you how awful your hacks are. And how you should test more… before releasing to production. See how it feels. :) If not anything else, it’ll make you respect the developer community more than you already do. And that’s how it should be.

Tuesday, February 10, 2015

Using the Call stack to debug programs

In large pieces of malware it is difficult, when under time pressure to fully reverse it. So, many times you just put a breakpoint on the imported functions like say send() for outbound TCP connections. Then you run the malware.

What happens now, is that the BP will be hit when the malware tries to send traffic. And that's fine. But where was it called from? As in, which function actually made the call to send()? This is important because it'll help you go back and then find out how the payload was constructed, how the C&C was decided... and many other things.

There's 2 ways to do this. One, you can identify all the references made to the send() call by the binary, and set breakpoints on all of them. Then when one gets hit you can inspect it further. This is the most obvious way to do it.

The other way is to first, as usual set a breakpoint on the function you want to trace. Here I just choose SetUnhandledExceptionFilter() as that's the first function I can see :). The address immediately after that is 0100251D - make a note of that.

Then, instead of searching through the 123 references to the call and trying to guess which one is useful, look at where the function is returning to when the  breakpoint is hit. This can be found on the Olly call stack. Many times you have to step through a lot of system function code, but eventually, it will return to user code. I tend to use the Ctrl+F9 while in system code, and keep watching the call stack for when the address changes to one where the code is running from.

Then I visit that address, set a breakpoint and hit F9 again. This time I will break inside the exact place where the call was made from. Note the address? :)

Then we go to that address and set a breakpoint and run...and we'll break in the exact place where the call was made.

Debugging child processes - Olly 2.01h

A lot of malware creates new processes and injects the actual malicious code into the memory of that process and then runs it from there. could start debugging malware.exe...but then find out that it has called CreateProcess() and created a new child process boo.exe. Then it called VirtualAlloc() and allocated memory into some part of boo.exe.

The point being... malware.exe has nothing and you now need to debug boo.exe. There's a few different ways I have tried, all to varying degrees of success.

- Get PID of the Child process from Process Explorer. Then go to Olly and Attach to process.
- Set Olly as JIT debugger. Windows Task Manager. Right click - Debug
- Set Olly as JIT debugger. Set the entry point of the process to CC (a breakpoint) and run the process.
- Try and attach Immunity to the process while debugging the parent in Olly.

But none of that, for multiple reasons worked at all... or worked very sporadically. That sucked.

Luckily OllyDbg 2.01h has this new (well for me ;)) option to debug child processes created by Olly. Meaning, if I'm debugging malware.exe and it creates a new process..and you have this option checked, it'll automatically open a new window and load boo.exe into it :). Perfect.

You can find this option in Olly 2.01 (latest version) in Options - Events. Tick the box that says 'Debug child processes' and save yourself a lot of pain :).

View PE header - Olly 2.01

Many times we want to view the PE header to study certain fields. Its easy enough to do this with a million PE editors out there. You could even do it programatically - I use the pefile module in Python to do it. There's almost certainly others. But the point is... that I love Olly :). How can we do it in Olly?

Well, in Olly 1.10 you could go to the PE header location in the Dump section, right click and select Special - PE header. And this would perfectly parse the PE header and display it in the Dump. Or just go to View - Memory Map, right click and select Dumo.

But Olly 2.01 doesn't have this. The option seems to have changed. I had trouble finding it but finally managed. There is no Special -> PE header menu but a 'Decode as structure' menu.  Here are steps on how to view the PE header in Olly 2.01:
  • View - Memory Map. Locate PE header for the exe. Get the address from here.
  • Go to the Dump section of Olly. Ctrl+G - enter address.
  • So you can right click on the start of the PE header (MZ)  - Decode as structure and in the Drop down box there select IMAGE_DOS_HEADER. Now you see just 1 section decoded.
  • Scroll down a bit till you come to the PE header. Right click on the PE header (anywhere...but make sure its inside the PE header, not the DOS header - Decode as structure - IMAGE_FILE_HEADER.

And so on.. use all the different IMAGE_ structures to decode the entire header. Its a bit of a pain really, and I preferred the old way :( or maybe just use another tool - just not Olly 2.01 for this. Its good for other structures though..probably, as you might have guessed.

Extracting PE files from memory

Was recently trying to debug some malware that someone gave me. The malware was extracting itself into memory and working from there. It is possible to debug the malware from memory itself but its a bit painful, since we have to debug the entire memory each time, let the memory get populated with the malicious code and then try and understand what the code is doing. This is a waste of time - and if we can avoid it, we should.

As it turns out, its possible to dump contents from memory. This is specially useful if its a PE file that's unpacked into memory. Coz.. you could dump it and reverse it separately without all that running of the original malware that unpacked it.

I always knew this was possible but for many reasons (a laziness to learn being foremost ;)) I never did it. This time, I was determined to figure out how to do it. Turns out it is fairly straightforward. All credit for me learning this goes to this blog -

That blog should be self explanatory really, but here were my steps.
  • Load process in Olly and debug it as usual.
  • Once the process has loaded in memory, locate it in the Dump section of Olly.
  • Right click inside the dump. Backup - Save data to file to an EXE file on disk.
  • Launch a PE Editor (I use LordPE). Locate the last section. Add the RawSize and RawOffset. Record the number.
  • Open the hex editor and go to this number (offset) inside. Delete all content after this number. Usually this is all zeros.
    Now go to the start of the file. Delete all content inside the file upto the start of the header (4D 5A). Save the file.
  • Open the file in Olly...and there you go.. a normal EXE file.

I tried some plugins like OllyDumpEx but they did not work for me. They probably are fine - just that I mostly made a mistake while using it. I will try some nice plugins soon and update this post when I am done.

Hope this post helps someone easily unpack malware from memory manually. Its fairly easy and you do not need a plugin to do this :)

Wednesday, February 4, 2015

Volatility - Extracting malware signatures from memory

There's a million tutorials online on Volatility and how to use it. This post will teach you nothing new. It is just my own way of learning the tool. All I'm going to do here is to go through each and every plugin which is listed very well here and very well explained here and make my own notes. If you're starting off with Volatility don't read this post - go read the official documentation.

The purpose quite simply is to just help me remember the tons of plugins that Volatility has, so I can use it while performing malware analysis of all those dangerous pieces of malware.

I'll use the memory images of Shylock downloaded from here to practise running Volatility's plug-ins against. This image can be downloaded from here. The images that I got from the Volatility Git repository (git clone) didn't work for some reason.

- The -f  (image name) and --profile (what OS the image was extracted from) switches are used in almost every command.

- Imageinfo suggests profiles while kdbgscan definitely identifies the best profile to use. Sometimes though kdbgscan also identifies multiple processes. In such cases look at the number of processes that it identified and proceed accordingly.

- pslist gives you a list of processes in the memory. psscan does the same but includes hidden processes which are there maybe, coz of rootkits. pstree does the same but gives you a nice view like Process Explorer does on disk.

- dlllist is a nice plugin that gives you the DLLs loaded by a process. Its best to identify the PID of the process you are after using the previous plugins and then use the -p switch with this plugin. dlldump is the next logical step, you find a DLL and try and dump it. Again, makes sense to focus on a specific process.

- handles, while super-verbose is a nice way to quickly see what all stuff a process is referring. Use the -p and the -t here to filter output. Its very similar to the Sysinternals handles utility.

- cmdscan lists out all the commands an attacker typed and consoles goes one step further by listing the exact output that the attacker saw when she typed it.

- connections lists out all the connections that were active when the image was captured. connscan is cooler in that it also identifies connections that were terminated.

- sockets lists all the listening sockets as well. sockscan does the same thing but in a different way. netscan does the same thing but for different platforms.

- hivelist searches for registry hives in memory. You can actually print out entire registry subtrees using printkey with the -K option which searches all the hives in hivelist's output above and returns the keys with its values if found. hivedump is super-verbose and recursively prints all the keys found.

- hashdump gets stored credentials from memory of all the local OS accounts. You could grab hashes from here and then crack those offline.

There's plenty more and I'll keep adding to this list as I play with them over the next few days. :)

Tuesday, February 3, 2015

Writing ClamAV signatures

Obviously while learning about malware analysis it is not enough only to know how to reverse malware. I should know how to protect against them as well. So I wanted to learn how to write signatures really well - it could be useful. So I will learn how to do so using the following:

  • ClamAV
  • Yara
  • Suricata
  • Snort
This post... I'll start with ClamAV. Here are my rough notes. The aim is to be able to refer to them over time. They are not the most polished blogs ever nor do I intend to make them appear to be that way :)

Here goes.

Stop updater daemon: sudo /etc/init.d/clamav-freshclam stop
Update now: freshclam
Update but run as a daemon: freshclam -d
Stop clamd daemon: sudo /etc/init.d/clamav-daemon stop/start. Don't try and start it from the CLI

Send clamd commands using socat:

  - socat - /var/run/clamav/clamd.ctl. Then type command. If started in TCP mode though, you can normally telnet to a port.

  - The connection to the socket times out though fairly quickly by default, so I'd have the command copied to clipboard :)

  - List of all commands available in the clamav documentation on page 17

Scan files: clamscan /tmp/virus_test
Scan files using clamdscan: clamdscan - < /tmp/virus_test

Use libclamav to scan files from inside other software. Can be used with C programs only.

Info about a database file: sigtool --info

Creating signatures:

- Make sure that you unpack the binary before doing this, else it's not very useful.

Hash based signatures:

sigtool --md5 test.exe > test.hdb
clamscan -d test.hdb test.exe
The moment a single byte changes, this signature will fail

Extract sections from PE file and create a signature for each section:

Use my hash_sections script to do this. Another option is to save all sections and use sigtool --mdb

Remember sections with a zero size will cause clamscan to break so don't add those

Also remember that this method is best used AFTER you have done all your analysis and want to detect a packer.. so here you don't necessarily have
to unpack the binary before writing a signature

Similar files but minor difference in certain bytes

This means that it is the same malware but hash based signature or section-hash based signatures will not work. For example:
md5sum 03*
  ae831fcf5591dc0079ebfe4654f23f52 031.exe
  b20a1db0a01f7a6f14f503a6fcdd6c0f 03.exe

Here's a sig where the {4} are the only bytes that change. Save these manually into a file with an ndb extension:
TestSig:1:EP+6:8DBEEB7FF7FF57{4}10909090909090 [EP is entry point, and this can drastically reduce false positives]

Same logic but with much more powerful signatures. These must be stored in ldb files:

The difference here is that everything is separated by ; characters. All the patterns are right at the end. The penultimate block is the one that decides how this pattern is actually applied, the previous block decides which files the signatures is applied to. The first block is just a name..can be anything.

Whitelist files

Same name as the database in which the detection signatures exist. So if all signatures are in daily.cld

The whitelisting file should be by the name daily.fp and have this line (hash : size : random name) in the same dir as daily.cld
5523530941c409b349ef40fa9415247e:51204:Whitelist signatures

This is despite a BAD signature being there in the daily.cld... it'll just IGNORE the bad one

Whitelist specific signatures

Same name as the database in which the detection signatures exist. So if all signatures are in daily.cld

The whitelisting file should be by the name daily.ign and have this line (goodDBname : line number : Actual signature name) in the same dir as daily.cld

This is despite a BAD signature being there in the daily.cld... it'll just IGNORE the bad one

Some nice References:

The sample files, scripts and its signatures can be found in my Git repository -

Friday, January 30, 2015

A malware sample I analyzed

Recently I analyzed a malware sample. I don't know what it was or whether I completed it but I stepped through it and wrote a very detailed report about it that I'd like to share now.

It is completely possible that I have missed things in it, but honestly anyone reading through it, specially if you're at the beginner-intermediate level should get some useful information from it.

I'd love to hear more feedback on how things can be done better, and if anyone has indeed analyzed this deeper and better than me - do call me out.. and if you can get in touch with me somehow so I can learn :)

I started a new repository on Git just now - to add a lot of my random stuff that doesn't really have a specific home. Here's the link to the PDF report (no it is not malicious :)).

I cannot see how I can upload the sample to offensivecomputing so here is a link to a virus total analysis instead. I guess anyone interested should be able to find a sample using the hashes on this link.

Tuesday, January 27, 2015

Statically linked binaries - Library detector

A program can be compiled dynamically or statically on Linux. For simplicity's sake - I considered only C binaries. When you dynamically compile a program the libraries do not get included into the binary itself - the functions that they export are called at runtime. In a statically linked binary however, all the libraries that the binary needs ... to run... are part of the binary itself. And that if you are reversing a pain. Coz you don't know which part of the code is the binary...and which part is library code. IDA detects a lot - but not all of it... not enough ..for sure. So I decided to try and so something..

This little project came into my mind primarily while playing the reversing challenges in CTFs. The files there used to be massive (4 digit numbers of functions) and very difficult to solve (for me anyway :)). I would never be able to identify which code was library code - in the case of statically linked binaries. Thus I could never complete those challenges OR it took me a lot of time. I still can't complete many but that's a separate story ;)

Anyway TL;DR I wrote a few simple IDAPython/Python scripts that basically compare the IDB of the binary to be reversed and a whole lot of library code. The more idea you have about the exact libraries that were used while building the binary - the more accurate this tool will be.

It is certainly a start to a fairly complex problem IMO and I hope that people more knowledgable than me in this space, can extend this and make it even more useful. At the very very least, I hope it will at least show people what NOT to do while attempting to solve this problem :)

The code I wrote can be found here.

Hopefully over time - I can make this even better or maybe find a better solution to this problem.

Friday, January 23, 2015

Anti debug mechanisms - Windows

Been busy with some stuff so haven't got time to blog much at all. Anyway I was playing the Flare challenge and the last one was challenge 7 which was a 32 bit Windows PE file. I haven't yet managed to complete it due to some silly detail that I've overlooked :(. All the same though - there were a few really nice Antidebug and AntiVM mechanisms that I learnt about and thought of sharing. If I wait for the challenge to get over - I might end up never writing it :).

IsDebuggerPresent: This is one of the oldest tricks to detect a debugger. Usually the malware will call this function and check its return value. If its 1 it means it's being debugged.

PEB IsDebuggedBit: The PEB is a block that contains a lot of information about the currently executing process. One of the first fields in this structure is the IsDebugged bit. If the application is running inside a debugger, the value of this bit is 1.

SIDT: The IDT is a data structure that has the addresses of numerous functions that are called when specific interrupts occur on a machine. SIDT stores the addresses of the current Interrupt Descriptor Table (IDT) into a register. On a normal machine, the address of the IDT is lower than 0xd0xxxxxx. If the address is greater than that, the malware is running inside a VM.

VMXh:  There is a privileged instruction called IN. Meaning... it will run only in kernel mode - a normal user can't write an assembly program and call it. When in eax,dx executes, it fills up ebx...and if it has 'VMXh'... it's running inside VMWare. So malware will do this check as well to detect if its running inside a VM.

OutputDebugString: This API will try and print a string. That's it. But it'll be successful only if the program is being debugged. You'll see the string that was printed in the Log window of the debugger. Malware will probably check if the function succeeded and make a decision accordingly.

CC bit checking: The single byte CC stands for a software breakpoint. The moment you set a breakpoint, while debugging a program, the byte on which you set your program is set to CC. Malware might search an entire range of addresses for the presence of any 'CC' byte and exit if it finds one. Meaning... if there is a CC byte - there is also a debugger. There's no reason for one to be there if it runs normally.

NTGlobalFlags: If a program runs inside a debugger the offset 68 of the PEB (NtGlobalFlags) is set to the value 70. This value is set based on the values of some heap manipulation flags. Malware will check for this value and accordingly make a decision.

Apart from these 6 checks, the Fireye challenge also used the LocalTime64 API to check if the time was between 5 and 6pm on a Friday and would go down the wrong path if it wasn't.

Your filename on disk needed to be backdoge.exe. You needed to be connected to the Internet so you could get a couple of IP addresses ( and which was also used. Lastly it also retrieved a specific string 'jackRAT' from a Twitter post and used that as well.

Basically, each of these checks caused the code to XOR with a different string. If you went down the wrong path, it'd still XOR, but with the wrong string and your end result - which is supposed to be another PE file would be incorrect.

So that's where I'm stuck :( - I think I have all the checks down - but I'm missing some fine detail .. somewhere and getting an invalid file. Oh well, maybe someday - I think I will look at one of the online solutions now. I've spent a lot of time on this without much success.

Here are some references though which I found very useful:

I hope you learnt a few things. None of this is really new..honestly... but hey, it's just a place where I keep writing my thoughts out as I learn things.

Sunday, December 21, 2014

FireEye - FlareOn Challenge 6 Argument 2

So if you read my previous'd know I was stuck on Argument 2 last time. I finally managed to crack it with a little help. The answer had been staring at me all the time and somehow I'd overcomplicated things. Oh well. I learnt a lot.

So..the last time in the previous post, I was stuck at the function 401164. I went through it multiple times, sat and marked blocks out in IDA, used Hexrays on it to get C code (I don't do this until it's an absolute last resort) but all I could see was loads of operations on a big chunk of encrypted text.

Yeah. So I found this chunk of text at 729900..and this is exactly what it looked like.


The function iterated for the length of this string and performed some operation on it ... using a byte array at 4F4000. But I couldn't understand what it was doing....with all that math. Specially because the very next function, was just an exit function.

Now I'd marked it as an exit function a long time back...and forgotten about it...and not analyzed its code carefully at all. After all, it's exit() ffs... what's there to look into? I was wrong :(

I pinged a person on reddit for a pointer. He/she said I was close...and should think about encoding and the = sign. Sure, I think. Base64. Obviously. But why is that relevant here? Yeah the encrypted text had an = at the end...but so what? It's too big for an Email address. So what's the damn point of decoding it? That's what I'd thought...a long time back.

Anyway I copied the text and threw it into an online Base64 decoder...

and my eyes popped out when I saw what I did.

Look at the screenshot...the right pane..towards the bottom. You'll see some ASCII text called /bin/sh. It's trying to call a shell.... it's shell code. And I'd been staring at it for at least 2-3 days. #-o. Serves me right for assuming things. Sigh.

Anyway, if there's shell code, that means the program is going to jump to it at some point. And I all that's left is that exit() call at 44bb2b. When the hell is it jumping? And where's that code?

So I then decide to separately throw the shellcode into a disassembler and analyze it. Since it didn't have the ELF header and I was in no mood to recreate one (if I could :D) I threw the code into an awesome online disassembler at

...and the code there looked very very familiar. I'd seen it somewhere. Where? Then it hit was IN the exit function.

The exit function had code that compared something with '1b' at the offset cfc4... and exit if it didn't match. And what was it comparing it with? The 2nd argument. And if you didn't enter that correctly..which I wasn''d fail.

So at this was just about reversing the algorithm inside. Here I have a confession to make. While searching for hints and verifying the 1st argument, I'd accidentally seen part of the 2nd argument on one of the solutions, so I knew it started with lin. That sucked. But anyway... just to verify I entered 'l' passed the jump. So the I just needed to solve that entire algorithm...which was just different basic math at every step. Rotate right an left, xor, add, and sub..with binary and hex. I started doing it manually...but was just horribly bored as the pace was very slow guessing it.

So I decided I'd write code for it and solve it. It's basically mind numbing work predicting character by character...and the algorithm is different each time. An utter waste of time really... and this made no sense to me from FireEye's perspective. Oh well I guess that's how real malware is :shrug

Here's the code I wrote - it's just a very quickly written piece of code. Not great at all. But it works:

I wrote code till the "@" character...and then just guessed the rest. Here too...out of sheer tedium...I guessed 1 character...manually added it to my flag..and then proceeded to solve the algorithm again for the same character. And wondered wtf was wrong now...and why gdb kept throwing me out.

Only googling the exact answer showed me the error of my ways :|. Obviously, that's not how it works in real life...but really I was done...and there was nothing left to solve in the challenge, so I think it was ok.

The final flag was:-

Oh you bet. This was a painful painful challenge. What an utter load of junk there was inside. 7 or 8 functions out of some 2700+ that were useful. Sheesh :)

p.s..The bad thing was that someone had written the flag of challenge 7 underneath challenge 6... :( but luckily I have already forgotten it :D. I won't solve challenge 7 for a few days till I am sure I don't remember anything.

Friday, December 19, 2014

Fireeye - FlareOn Challenge 6 Argument 1

Challenge 6 was a 64 bit statically linked ELF binary. Now I haven't yet finished solving it (hey it's hard ;)) but have got through half of it. I needed a tiny hint though this time to set me on the right track. Anyway though...its complex enough that I can write about Part 1. Later maybe I'll add Part 2 once I solve it someday. If not and I look at spoilers, I'll try and blog about the learnings :). Ok lets start.

Once I ran file and identified it was statically linked, I sighed inwardly. That's coz static binaries have the Linux libraries that they are linked to as part of the binary. This ensures that these binaries will run on any system - unlike when there is dynamic linking in place, and dependency on specific libraries. From a reverse engineering standpoint this is bad, because the disassembly in IDA is cluttered with library code and its super hard to identify what code is user written.

Ran the file in a VM. Always make sure you run the file in a VM. All it did was come back with "no". That's it. So the next thing is to find out where the "no" was referenced. So after a bit of stepping in and stepping out, the call at 45dea0 looked suspicious - it pointed to another function at 45dce0. This eventually makes a call to sub_452079. A quick tip incase you start to debug 452079 but forget where it was called from is to either:

-- Hit ESC to go back and Ctrl+Enter to come forward again
-- Or right click on sub_452079 in the IDA View and click List cross references to

Anyway, on first sight 452079 looks like a gold mine with a ton of juicy strings. Every single one of them is utter garbage. So I just kept scrolling in IDA till I saw the last (or so I thought) strings go by. And set a breakpoint after that. Sometimes it'd not hit my breakpoint because it's position was wrong. But slowly, after a few attempts, I managed to find out where the "No" was getting called from - 4535bb. And then the very next call at 4535c5 was causing the program to exit.

So one thing was then clear, I had to skip that jump for sure, so I used a little bit of gdbinit magic :) for now to patch the program in memory and see where it leads me. The .gdbinit file is something that gdb runs commands from before starting to run the program.

br *0x4535b4
set $ps=$ps&0x4

This basically breaks execution at 4535b4 and unsets the zero flag so execution can jump over the exit code, and then continues execution.

Well, that worked for a while but then I crashed after some time with a very fishy looking "Program has a segfault" error. Digging further revealed that all this was happening at 41F21C. So usually when something like that happens, the code that is making that happen is usually in the call just above that. So.. digging into the call at 41F211..revealed that there was a line at 474319 which made a syscall and had an argument of 0x65 [syscall(0x65)]. This means that there is some kind of system call being made...and it's very interesting many times ;). But we need to find out what 0x65 stands for. So let's search for a syscall table online. We find a nice one at:

What's 65? semop. Eh? Linux IPC? Wtf. What does that even mean? Ah its Hex. And the table is decimal :). 65 in hex is 101 in decimal and ptrace. Ah that makes sense. ptrace() is a call on Linux that detects if code is running inside a debugger, like us. gdb remember? So we need to run outside gdb. But then how do we find out what's going on? We can't. So we do some more gdbinit magic. Heh.

br *0x41f21c
set $ps=$ps|0x40

Okay, that sets the zero flag, because this time ..that's what we want and jump to 41F232...not to the silly SEGV message.

Great...then we continue debugging...and the program just hangs. Like just hangs. Nothing I do makes a difference. In the past, when I've debugged and a program has hung, it's been because it's potentially waiting for some network connection and listening. Checking with netstat and lsof revealed nothing this time, so the next logical candidate was a long long sleep() call.

So .. hitting a Ctrl+C inside gdb gave me the address of where the "hanging" was happening.

Breakpoint 2, 0x000000000041f21c in ?? ()
Program received signal SIGINT, Interrupt.
0x0000000000473d50 in ?? ()

Hmm... 473d50. What's there? Ha. Another syscall...this time its 0x23.  Or 35 in decimal. Lets go back to our syscall table again. Ah there we go. nanosleep(). Man page - "nanosleep() suspends the execution of the calling thread until either at least the time specified in *req has elapsed"

ok..that's the first argument to nanosleep() then which is some huge number. Now where is the first argument stored. It's called from 473C67...and the arguments are stored in the lines just above that. That's rdi and rsi then. That's how things happen in Linux...the arguments are stored in registers and not explicitly pushed on to the stack like in Windows. So... if we can edit rdi to a smaller value than what's currently there....which is

(gdb) x/d $rdi
0x7fffffffd530:    3597

...that's 3597 seconds. 1 hour. I'm not sure we want to wait for an hour before resuming debugging. So I did some more gdbinit magic and edited rdi at runtime to 0x10.

br *0x473c67
set $rdi=0x10

... which should have caused sleeping for 10 seconds. But it didn't and almost returned instantly. :D. Not sure why. Didn't dig too deep though as my purpose was to get past the sleep() call.

The code went on ..after this. But if you're still following along I'd advise you to straight away put a breakpoint on 44bb2b :D. Coz all the code .. well most of it :) after the nanosleep() and before 44bb2b is utter utter junk and you'll just waste hours stepping in and out.

There is a call here to 44b942. Scroll down here as well...until you see the last 2 calls inside this function. A little debugging reveals that the call at 44bb2b is an exit call and the program just exits here. Which usually, usually means that the call before that has some juicy stuff. This is the call at 401164. But while this function looks different and interesting....its very unclear where the flag is hidden here. Coz all that happens inside here is a lot of integer operations and some fancy math :(.

Now at this point I got stuck and decided to ask for help on reddit. 2 people who had already solved it very kindly poked me in the right direction ..while talking about arguments. Ah... the program has arguments? How many? What? Time for strace. Thanks guys :)

strace ./C6 1234 abcd ... shows me that SEGFAULT message. Means its 2 arguments. Nice. I'm still unclear though, about how to identify via static many arguments a program takes.

Anyway... so then logically, what are those arguments? So running the function with no arguments...if you remember gave us "No"...but running it with 2 arguments gives us "bad". Trying to figure out where bad came from in the first place was another big pain.

Eventually though as it turns out...bad is called from 2 places 43710a and 4371DE. The first one checks if the 1st argument has 10 digits

cmp rax, 0xa

and if that's goes on to 437120 ...else it goes to "bad". The second "bad" message is at 4371DE.... and the comparison is at 4734b4. Now this was the first place I found that IDA was wrong. :-o. This caused more chaos. Eventually with gdb's help I found out that the comparison was happening at 4734b4.

.text:00000000004734AE mov     eax, [rdi-0Ah]
.text:00000000004734B1 mov     ecx, [rsi-0Ah]
.text:00000000004734B4 cmp     eax, ecx

Now one of these (I forget which) contains a string "bngcg`debd" and the other...I have to enter the right input so it computes to this string. A little bit of playing around with 10 digit numbers and I found the right number. (I think I got lucky here directly with numbers, I could have easily tried strings and struggled a while longer :D)

The first argument is 4815162342.

I haven't figured out what the 2nd argument is...and want to slog at this a while more before reading the many great solutions to this that are already online. But...this was complex enough for me to put it out :)

Hopefully I'll have part 2 soon. Until then ...adios :)

Sunday, December 7, 2014

OllyDbg - Running DLLs

So this isn't something new really. There's plenty of articles that talk about running DLLs. You usually either write a small EXE that uses LoadLibrary to load the DLL or use rundll32.exe with the arguments set to calling DllMain(). That'll work.

But that'll work only if all the functions are eventually called. I mean... if a DLL has 4 functions A, B, C and D. And the program flow is something like:


a() {


... it'll work and you'll end up being able to reverse the entire DLL.

But if you have a 5th function e() that isn't directly called... and is called only on some specific case... you won't directly ever end up there.

A quick tip on how to analyze this in Olly is to identify the function() to reverse using IDA or any other disassembler and go to that address. Now right click on that address and click "Set new origin here". This will allow you to run that function :)

Of course...this will work out of the box only if the function takes no arguments at all. If it does you will have to set up the registers EAX, EBX, ECX and anything else...with the correct arguments. This you can finding out where it was called from... or by studying how the arguments are processed inside the function by running it and seeing why it crashes.

For example: A function may need 2 arguments and takes these from EBX and ECX. So you might fill in EBX="A" and ECX="d" and try and run the function. But you might find out later that there was code which was dividing EBX and ECX... (EBX/ECX). This means that they both had to be numbers... integers maybe. So you fill up EBX=4 and ECX=2 and see what happens. It might crash again but for some different reason...and you then go back .. and so on :)

Nothing new but a quick little thing that I learnt last week or so...while working on that Fireeye challenge.

Sunday, November 30, 2014

Fireeye - Flare-on Challenge 5

This one was a DLL. Ooh another nice one. Most of my previous reversing success has either been PE or ELF so it's really cool to do all these cool challenges and improve.

Now a DLL is something that has a ton of functions that an EXE calls. You can't directly run a need to make an EXE import it and then debug the EXE. At least that's how I've done it in the past :).

So I tried doing this with Olly 1.10 which comes with LoadDLL.exe. That failed and Olly got stuck. So I abandoned that idea and decided to use regsvr32 and rundll32 instead. What eventually worked was rundll32. So you load Olly rundll32.exe and set Olly's arguments to the DLL (5get_it.dll, Dllmain).

Also ensure that you're using Olly 2.01 and have set it to break each time a new DLL is loaded. I had a small blog post on this here.

So eventually...I broke in at DllMain...and with the help of IDA and some F8 in Olly it was clear that svchost.dll in the system32 directory was being overwritten. A registry key was also created in HKLM\....\Run... to ensure the DLL was run each time the machine rebooted. That much was relatively easy.

Now after all this...the code seemed to jump into one of the largest functions (at 10009EB0) I have ever seen. It seemed like a massive massive switch/case loop. Here's a pic..that shows how big it truly was:

There was an API called GetAsyncKeyState...that was called and then it went into the switch-case structure. Here's a screenshot showing the code inside a couple of these functions. Take a guess what it is?

See the 'v' and 'w' in the screenshot? That's basically what's pushed to the function at 10001000 which then appends the character 'v or 'w to the svchost.log file in System32. Each little function does similar things....just for a different character each time. In other words...this is a keylogger :)

Now I've been duped many times in the past following code down dead ends so I decided to write a little IDA script renaming all functions of this I could ignore all of them and understand the rest of the program.

That code is here:-

That made my life much easier...coz most of the code got renamed and there was very little left to look at :). The bad part was... none of the other code seemed to have anything relevant to the flag at all. :(

Okay lets run it in Olly...maybe something will turn up. Nope. It just remains in an endless loop...and logs keystrokes to svchost.log.. every single character. Now what?

Okay... lets now start opening up each of those functions...nothing super interesting until we come to 'm' and toggle the ZeroFlag....there's something different there. It makes a call to 10001240... something that none of the other letters do. Running this function causes a small message box with some ASCII ART (FLARE) to pop up.

Well awesome. That's progress. But now what? The box doesn't have the flag in it, does it? But since the box pops must mean something. What? Okay.. so when does the box pop up? When I hit 'm' and the variable at 100194fc is not <=0 . Hmm.

I know how to hit 'm'... but how do force 100194fc = 1 ... or ..not <=0.. so the correct branch is taken? Right? If I can answer that...I have the flag. So we want to now search for references to 100194fc. Lets search in the IDB. Click Search - Sequence of Bytes and enter fc 94 01 10 (little endian). The only reference that's useful is an instruction...

.text:10009BF7 keylog_charbychar_10009B60 mov     dword_100194FC, 1

This means...that somewhere...some single is setting 100194FC to 1. This is inside the function starting at 10009B60. What letter does that map to? maps to "o"... so stepping back...if we enter "o" it will set 100194fc to 1 ..and then if we press 'm' the ASCII art will pop up. Nice :).

So in other words we have to go the values of every variable..see where it gets set to 1 and find out the next letter.

The mov instruction content to search for is given below. Byte in square brackets changes for each letter.
c7 05 [fc] 94 01 10 01 00 00 00

For "o" we need to find where 100194ec is set to 1. That is inside "c". So the last 3 letters are "com". We're very close :)

Keep going this way... and eventually you end up with the flag that is: