Tuesday, November 13, 2018

Content Security Policy - Quick Reference

This is a post to help me remember the various parts of CSP. The w3 specification for CSP is very readable - this is NOT a replacement for them - just something to help me remember the directives :)

Here's a nice link where you can generate your policy bit by bit.

Remember, by default content is allowed to run on the web - not blocked. If browsers made the defaults as 'block all', I'm willing to bet a lot of issues would go away.

Don't use:

- unsafe-inline: Allows inline JS (includes javascript:) to be run, this is where a ton of XSS happens
- unsafe-eval: Runs eval() on any JavaScript user input that is passed to it
- data: The 'data' tags allow content to be encoded as text/html or base64 and are another way of delivering inline content


- child-src: Controls where <frame> and <iframe> can be loaded from
- connect-src: Controls where you can make direct connections to web-servers to (fetch(), WebSockets, XHR, EventSource)
- default-src: If the site uses JS and you haven't whitelisted any sites, it'll look at what's here and try loading a script from here. This is the default for every other fetch directive. Starting with 'default-src: None' is a good idea to start white-listing content
- font-src: Where can I load fonts from?
- frame-src: Where can I load Iframes from?
- img-src: Where can I load images from?
- manifest-src: Where can I load app manifests (metadata about a specific application) from?
- media-src: Where can I load audio, video and subtitles from?
- prefetch-src: Where can resources be prefetched from? This just means that some resources on the page will be 'processed' (DNS resolution for example) before they are actually requested
- object-src: Where do plugins (embed, object, applet) get loaded from
- script-src:
    * A list of white-listed sources for Javascript.
    * 'self' indicates that the browser should load scripts only from the site itself and nowhere else.
    * This controls inline scripts as well as XSLT stylesheets that can trigger script execution.
    * Adding 'nonce = really_random_nonce' or 'sha256-hash' can allow very specific inline scripts if there's no way to whitelist inline scripts
    * strict-dynamic accompanied by a nonce for a script, means that any scripts recursively called by that script are automatically trusted, without needing a nonce or hash themselves
- style-src: A list of whitelisted sources for CSS
- script-src-elem, script-src-attr, style-src-elem, style-src-attr all similar to script-src and style-src, except that they allow blacklisting specific tags instead. Not yet in browsers though, but here's a Google Group post.
- worker-src: Where can I load background Web Workers from?

Document directives:

- base-uri: Controls where relative URLs can be loaded from
- plugin-types: Restricts the types of plugins that can be loaded into the document
- sandbox: Controls what the IFrame that's embedded in your page can do. You can allow scripts, popups or forms for example

Navigation Directives:

- form-action: Submit forms only to specific whitelisted URLs. Useful when an attacker can actually inject their own form tags
- frame-ancestors: Defends against clickjacking attacks by limiting the websites that can actually frame the target site using frame, iframe, object, embed or applet tags
- navigation-to: Limit the websites that a page can navigate to

Reporting directives:

- report-to: If CSP is started in report-only mode, where do you send the report violations

Other important directives:

- upgrade-insecure-requests: Upgrade all requests made over HTTP to use HTTPS

- block-all-mixed-content: Ensure that all resources are requested over HTTPS, as long as the page is loaded over HTTPS
- require-sri-for: Subresource integrity for all scripts requested from a third-party-domain to detect tampering on the way

Other directives:

- referrer: Sends referrer only under certain conditions
- reflected-xss: Controls features in user-agent to prevent xss

Thursday, October 4, 2018

SSH certificate authentication


* You can configure client-side and server-side authentication using SSH certificates with the existing openssh daemon.
* You never need to worry about MITM attacks on the client when connecting to the server the first time
* Significant decrease in management overhead of SSH keys on the server

If you have a remote server to manage and it's running Linux (or even Windows for that matter but that's beside the point) - it's very likely that there is an SSH daemon running on it. You use an SSH client to connect to it and perform administrative tasks on it. While doing so, you can use passwords (by default) or public key authentication which is a bit more secure as it takes out the password-brute-force attacks. It does mean though that there is some management overhead on both the client and the server side.

On the client, you have to add the host that you are connecting to your known_hosts file. So over time, you have a massive list of known_hosts with no clue about the purpose of each host. Similarly on every server, there is a huge authorized_keys file which has the client's public keys added to it. When you want to revoke a client key you have to go in and remove that client's key from this file on every server. When you want to not trust a server any more, you need to remove that entry from your known_hosts manually. This is something that can go wrong easily if you miss one server - so there's probably some automation that is probably required here that can make it more reliable.

Certificate-based-authentication goes one step further, where a client trusts any SSH server signed using an 'SSH-root-CA' and a server can in turn trust a client key only if it is signed by a 'user-CA'. There is a really nice post by Facebook where they automate this process and make it even less error-prone. Those posts do a good job of walking you through step-by-step but I did have trouble replicating it, so I'll do a quick summary of the exact steps here.

Server certificate authentication

1. Configure an SSH daemon on a server (Docker, EC2, VirtualBox doesn't matter - but ideally a separate host as it's the CA). Let's call it ca.
2. Generate an SSH keypair for the server CA.
3. Start a new server up. Let's call it host1. This too should run SSH. This is the server you want to login to and administer.
4. Generate an SSH keypair for host1 in /etc/ssh
5. Copy host1's public key onto the ca server. Sign host1's public key with ca's private key. This will create an SSH certificate.
6. Copy ca.pub and the certificate you just created from ca to /etc/ssh on host1.
7. Configure /etc/ssh/sshd_config to use the key you created in Step 4 as well as the certificate. This is done using the HostKey and HostCertificate directives.
8. Restart the SSH daemon or reboot your server to reload your SSH config so it uses the certificate
9. Configure the client machine (any machine apart from host1 and ca) to recognize the ca's public key using the @cert-authority directive. This is so you don't get a 'Should I connect? Yes/No' message the first time you connect to host1.

User certificate authentication

1. Generate an SSH keypair for the client. This is the userca.
2. Generate a second SSH keypair for the client. This is the key you use to connect to host1. Call it client.
3. Sign client with userca. This will generate a cert as well on the client.
4. Copy userca.pub to host1 and configure sshd_config using the TrustedUserCAKeys directive pointing to userca. This is so host1 recognizes that all user certs signed by this cert are to be accepted.

At this point, you should be able to login to host1 from client and never get a popup the first time I connect because I've explicitly trusted the server CA. It's also very cool that there is no need to do any more key management on any server, as long as you trust the user CA used to sign the user keys.


Dockerizing an SSH service
Hardening SSH

Tuesday, September 18, 2018

DevOps in AWS

This post summarizes the AWS services that are used to help you write code and reliably build, test and deploy it faster that things would be manually. The overall concept of doing all this automatically is usually summarized as Continuous Integration Continuous Deployment. Here is a simple post that nicely explains these concepts.

If you don't want to read any more the tl;dr is this:

* Write code using AWS Cloud 9 
* Debug code using AWS XRay
* Store code using AWS Code Commit
* Build and test code using AWS Code Build
* Deploy code using AWS Code Deploy
* Watch task progression at runtime from a single interface using AWS Code 
* Use an integrated dashboard for all your tools including issue tracking using
   AWS Code Star.

If you're not familiar with Git, I'd strongly recommend reading a little about it before proceeding and playing with all these shiny new AWS tools. A great source is this chapter from the ProGit book. Once that's done, come back here. It's fine to read through this post as well, even without Git knowledge - it's just easier with that background knowledge.

Cloud 9 IDE

Once you have an idea in mind and want to write software to actualize it, you need a place to write it. A simple text editor works just fine, but as your programs get more complex an IDE is really helpful. A couple you might be familiar with are Eclipse and IntelliJ. However, since this post is about AWS, I must mention the Cloud9 IDE. It is a browser based IDE that gives you the familiar environment. I haven't played with it too much, but it's good to know there is a web-based option now.


This looks like a code-profiler to me. I did not use it so do not have much to say about it. But I'd think the way to use it, will be to write your code and use this to figure out which calls are really slow and see if you can optimize your code further. All the rest I did try out and can confirm they are all very cool tools. So read on.

Code Commit

Once you finish writing all your code, you need a place to store the code. This is where all the VCS come in. Git is what everyone use these days. The AWS equivalent of Git is CodeCommit. It's so similar that you do not need to learn any new commands. Once you've set your repository up, all the old Git commands work perfectly well. You can add files, commit them and push them to your Code Commit repository.

All you need to do is install Git on your machine, create a key pair and configure your IAM user to use this to authenticate to Code Commit. Clicking the "Connect" button inside the interface gives you instructions per platform if you get stuck.

The coolest thing here is that you can create triggers that'll run as soon as you push code to your repository. Maybe you want to build, test and deploy your code to your test environment as soon as every single commit is pushed. You can do that here by setting up a Lambda function that will be called as soon a commit is made. Which nicely flows into Code Build..

Code Build

Once you have a workflow going where you can write code in an IDE and push commits to a CodeCommit repository, the next step is to make sure that your code builds properly. This is where CodeBuild comes in. All you do is point CodeBuild to the Code Commit repository where you stored your code and tell it where you want to dump any output artifacts of the program (usually S3).

It supports branches too, so you can tell it which branch to pull code from in Code Commit. You select your runtime environment, which you need to build code in (Java/Python/whatever), configure a bunch of other options and then build your project. The result is whatever you get after you hit Code - Build in whatever IDE you use.

The big advantages here are that you do have to spend very little time configuring your software development environment. Also, like I touched upon a bit in the Code Commit section, you could have that Lambda function you wrote as a CodeCommit trigger automatically run Code Build against your code each time a commit is made.

Code Deploy

Once the code is compiled, tests are run and your entire project is built, the last step is usually to deploy it to a web server so your users can then access it. That's where Code Deploy comes in. You can configure it so it uses the build output (with a deployable project) and puts it onto every web server you want to have it on.

You have options of using a load balancer as well, if you want traffic to be evenly distributed. Once deployment is complete, the output should appear on all the servers in that group.

Again, remember you can further extend your Lambda function to build and deploy now as soon as a commit hits Code Commit. Pretty cool :)

Code Pipeline

Code Pipeline isn't something new but it certainly makes life much easier. It helps though if you understand the the 3 previous services I talked briefly about earlier - since the screens in Code Pipeline deal with these 3 services and ask you for input. So I'd recommend understanding those Code Commit, Code Build and Code Deploy really well before using Code Pipeline.

Pipeline basically is a wizard for the other 3 services. So it'll prompt you to tell it where your code is (Code Commit) , what to build it with (Code Build) and what to deploy it with (Coce Deploy). If you already have roles and resources set up successfully when you played with the other 3 services - this should feel very intuitive when you do it. A couple of great tutorials are here and here. Also, a nice writeup on how someone automated the whole process is here.

The coolest thing about Pipeline is that you can see everything, stage by stage and where each stage is once you create it. For example: Once your code is pushed to Code Commit (as usual) and you have the Pipeline dashboard open, you can actually see each stage succeeding or failing, after which you can troubleshoot accordingly.


Managers should love this one. I used it just a bit but it has this fantastic looking dashboard that gives you a unified view of every single service that you are using. So in short, it has links to C9, CC, CB, CD and CP. So if you didn't cheat and did everything step by step :) you should see all your commits, builds and pipelines by clicking on the buttons on the fancy dashboard that is CodeStar.

The additional feature here is integration with Jira and Github where you can see all your issues as well.

So in short CodeStar is a one stop shop if you've bought into the AWS development environment and want to be tied into it for years to come, while parting with your money bit by bit :)

Friday, September 14, 2018

Networking in AWS

A quick post this time on how you can use the AWS CLI or SDK to create an entire network, without using the GUI wizards (which are great, but sometimes irritatingly slow :)).

Relevant code to do everything in this post and a bit more is all uploaded here.

First up, almost certainly you want a VPC, because some services are public and some are private. A VPC will help you separate these. So you use the CreateVPC call to create one.

Make sure you enable private DNS so your external clients can reach your private hosts.

Then you create public and private subnets, so you can put your public and private hosts into each of those.

Your public subnet needs an Internet gateway to talk to the Internet, so you create one and attach a gateway to it.

Once you have your VPC, subnets and Internet gateway ready you need to setup routes between them. The wizard would do this automatically but we have to do it manually. So you first create a route table for both subnets and add routes to each route table. Note here, that you don't need your private subnet hosts to talk to the Internet. If you do, for some reason you will need to create a NAT gateway in the public subnet and modify your routing table in the private subnet to send traffic to it.

Now everything is sort of setup. So you then think of access control everywhere. For starters you create a security group allowing only inbound SSH and HTTPS access for an EC2 instance in the public subnet and only MySQL access for an RDS instance in the private subnet.

Create a key pair (I reused an old one is this was just a test) so you can use it for your new EC2. Identify an AMI to run on your EC2 instance. I used the console for this but you can apparently use the CLI or the SDK to find this out if you want to.

Once that's done you launch an EC2 instance in the public subnet, with the SSH-HTTPS security group and your key pair. Make sure you assign it a public IP otherwise you won't be able to reach it. Login to the instance with your keypair and confirm access works.

Now you start thinking of things you want to keep in your private subnet. The 3 things I was working with were RDS so my EC2 could talk to it, Secrets Manager to store my RDS credentials and a Lambda function that is needed to rotate the credentials in SecretsManager. All of these should be in the private subnet.

A cool thing here is that you can create a private endpoint for SecretsManager so that all traffic to it is always over an AWS network and doesn't go to the Internet at all.

RDS only needs inbound access from EC2 and Lambda on port 3306. I'm not sure what SecretsManager needs but I gave it inbound 443 only (You should test this more). Lambda doesn't need any inbound access. Setup security groups similar to how you did it before.

Create a secret in Secrets Manager. Use a random name if you're testing, you can't reuse old names for a while, even if you have deleted the secret. This secret should contain all the information you need to connect to the RDS database and used when you actually create the database.

Create a DB subnet group, retrieve the secrets you stored earlier from secrets manager and the security group that you created earlier (*3306 inbound access*) and then create the actual RDS itself.

Once the database is created, the only task remaining is to create the Lambda function that will rotate the credentials for you in Secrets Manager.

Saturday, September 8, 2018

Confused Deputy

The confused deputy problem is one of the best named issues. Not for any deep philosophical reason, but just because it is truly confusing :). To me anyway, but then, most things are confusing to me, until I spend way-above-normal amounts of time re-reading and re-writing it in my own words. The link above (AWS) is an excellent resource, which I learnt most of it from, so go there first - and if you find that confusing, come over here and I'll try and explain it in my own words. As always, there's nothing wonderfully new here - just my attempt to make sure I remember, have fun writing and hopefully help anyone else along the way.

Let us just keep it simple here. The 3 people in question are Alice, Bob and Eve. Alice has software called MyBackup hosted on the cloud that lets you back up your images that are stored in the service called MyImages. Each time you use Alice's software you have to pay her 100$. Sure that's ridiculous, but stick with me. For some reason Bob thinks this is a great idea and pings Alice to use this service.

Alice creates an account and gives him a unique string called BobAliceBackup1987. She says that all Bob needs to do is to login when he wants to backup, paste the string into a text box on the website and click "Create Backup". This will automatically (details are not important here) let Alice into Bob's account and copy them all to her secret storage box that is very hard to hack and send Bob an Email when it is all done. Don't think about how lame this system is at this point :).

Eve now hears that Bob is using this service and likes it a lot. She subscribes to the service too and gets her key EveAliceBackup1991. Everything is good and everyone is happy.

One day Bob and Eve have a fight and stop talking to each other. Eve feels that Bob is wrong and wants to teach him a lesson. Frustrated, she logs into MyBackup to look at her backups. (WTF who even does this??). While typing in her "secret string" she suddenly wonders if she can make Bob spend his Britney Spears concert money on Backups instead. Can she predict Bob's key? Will Alice find out? Only one way to find out...

She guesses Bob's key (what a shock :/) and sends that key to Alice. Alice hasn't spent much time developing any kind of authorization models, so all she sees is a string come in and think - well there's another 100$ for me :). She just assumes (pay attention here) that whoever sends the string is the owner of the string and actually wants to back their images up. And she backs Bob's images up, 20 times in a row without thinking that something's wrong. Bob gets back at night (no there are no Instant Mobile alerts here for payment debits) and finds out he has backed his stupid car_bumper_dented images up 20 times. Alice is no help, she has proof he sent a string...and sure enough when Bob logs in and checks backup history he sees 20 requests too. Meanwhile Eve feels vindicated. Eventually she might get caught, eventually Bob might get his money back and eventually Alice will learn to write better software but that's beside the point. And yes, it's a made up example but one that hopefully helps you understand the point of the attack better.

In a nutshell, confused deputy occurs when a service with multiple users makes a decision based on user input that is predictable without asking for further authorization. In AWS world, the predictable input is a Role ARN that a service can assume in your account to do something in it. While it looks really big, it is not considered secret and if someone guesses it, they can make a service do things in your account - without your permission. Does that make sense? I hope so. But if not...

... go and read that excellent AWS blog again and see if it makes more sense.

Wednesday, August 22, 2018

Serverless Development

Just another post to solidify concepts in my mind. The Serverless word is often used these days in conjunction with development. All it really means is that you do not have to spend time configuring any servers. No Apache, Tomcat, MySQL. No configuration of any sort. You can just spend your time writing code (Lambda functions). Mostly anyway :)

The most common use of this philosophy is in conjunction with AWS. As in, you create a configuration file called serverless.yml that follows CloudFormation syntax. This basically means you create a config file offline with references to all the AWS resources that you think you will need (you can always add later) and then upload that file to CloudFormation.

CF then looks through the entire file and creates all those resources, policies, users, records, functions, plugins and in short whatever you mentioned there. You can now launch a client and hit the deploy URL and can invoke all the methods you wrote in your Lambda function.

There are some clear instructions on how to deploy a Hello World as well as how one can write an entire Flask application with DynamoDb state locally and then push it all online to AWS with a simple sls deploy command.

All you need to make sure is that you have serverless installed, your AWS credentials configured and access to the console (easy to verify things) and things will go very smoothly. Of course there are going to be costs to all this - so make sure you do all that research before getting seduced by this awesome technology :)

Sunday, August 19, 2018

Birthday Paradox

There's a million places the birthday paradox has been explained. I always forget it. So this time, I decided to write it down for my own reference, keeping just the salient points in mind.

To start, a year has 365 days (forget leap years for now). The chances your birthday is on say Jan 28th is 1/365. Hence the probability of it not being on Jan 28th is (1 - 1/365 = 364/365). Let's add your friend now. The chances of both of you not having a birthday on Jan 28th is (364/365)^2 (exponential). So for 253 people the chances of all of them not having a birthday on Jan 28th is (364/365)^253. Makes sense? If not, maybe read a bit of probability from some source you like and come back. There's zero shame in this btw, I needed to do it for what it's worth :).

Anyway, so now you think why did I pick 253 above? Well let's do a little math here. If there's 2 people in a room, how many pairs can we form where order doesn't matter? Just 1 pair right? What about 3 people (a,b,c)? How many pairs? 3 pairs (ab, bc, ac). With 4 people (a, b, c, d) it is (ab, ac, ad, bc, bd, cd). Right? So let's generalize this now so we can calculate it for a larger number, instead of 2, 3 or 4. That's where combinations come in - scroll down the link (just above) to get the formula - (23!) / ( 2!) * (23 - 2)! [It's 2! because a pair has 2 people and you're forming a group of 2]. Doing the math on that it becomes:

23 * 22 * 21! / 2! * 21! = 23 * 22 /2 = 23 * 11 = 253. See that number before? :)

Tying stuff back in, it means that if I have 23 people (including me) in a room, there are 253 ways in which pairs can be formed. And remember, the chances of any of them NOT sharing a birthday are (364/365)^253. It's not (364/365)^23. It's the probability ^ no_of_possible_pairs. Again, if this is going over your head - step back and read a bit of probability theory and come back once you're comfortable.

So if the number of ways there CANNOT be pairs is (364/365)^253 = 0.4995 by the way, the number of ways there CAN CAN be a match somewhere - meaning someone in the room shares a birthday is 1 - 0.4995 = 50.05. Meaning, there is just about a 50% chance that someone in a room will share a birthday if there's at least 23 people in a room. Not share a birthday with you - just share a birthday with anyone in that room. Make sense?

Now all that's fine but how does that matter in real life, keeping security in mind? I'm thinking of a couple of examples:

- If I use a 64 bit key to create a MAC, I'm thinking that there's 2^64 possibilities which is correct. But that doesn't mean someone needs to try all of them before a match is found. Because of the birthday paradox, it means the real number is sqrt(2^64) which is a number in the order of 2^32 which is way lesser.

- Digital signatures are another area. If I use an algorithm that is susceptible to collisions to create a signature, it means that an attacker can find a collision for my signature more easily and spoof it. Meaning they could change the message, fake a signature that looks the same as the original one, attach it to the message and no one will detect it.

The fix to all this is to ensure that you use hashing algorithms who give you a larger number of possibilities even after the square root is taken. Meaning for a SHA1 hash, which has a 160 bit output - after 2^80 possibilities one will start to see collisions. It looks like SHA 256 is safe for now :)

Thursday, August 16, 2018

Unit Tests - Why?

A unit test is basically testing one unit of code. One unit literally means one snippet of code. One snippet could mean 5 lines, a small function or in some cases even an entire small program. Usually though, in large corporate environments your code base is pretty massive so a good starting place is to think of a minimum of 1 unit test per function.

So you go to the function's code and see that it has 100 lines of code. The first 10 are just variable initialization, then there's a few calls to 3rd party libraries to get data, normalize the data and then log it. For e.g. Requests to REST APIs, convert the returned data into Json, add a new key with a timestamp to it and then log success or failure.

Finally once these calls succeed OR fail, your code returns True or False. So when you write a unit test, you're thinking (non intuitively) but you're thinking - "Let's assume that all these calls succeed and come to MY code, where I make the True/False decision. I want to make sure that my code reacts properly in either case.".

Which basically means, if there was a way to make all those calls (requests, json, log, add timestamp key) NOT run at all, in the first place and just provide a FAKE normalized, Json blob with a timestamp in it TO my True/False code - I'd do it. Because think of it, you're NOT testing the functionality of any of those calls - you're just interested in your code. So let's see a fake example here:

def code:
   a = 1
   b = 2
   c = 'https://fakesite.com'

   response = requests.get(a,b,c)
   json = json.convert(response)
   json['time'] = time.time()
   logger.info("Request completed")

   if json.size > 2 and json has.key('time'):
      return yay()
      return oops()

def oops():
    return 0

def yay():
    return 1

All you want to test is if oops() and yay() get called correctly. Nothing else. The end. So your yay() test (never mind how right now) looks like:

def test:
   response = patch('requests.get', 'fakerequests.get')
               #fake returns {}
   json = patch('json.convert', 'fakejson.convert')
               #fake returns {'a':1,'b':2}
   time = patch('time.time', 'faketime.time')
               # fake returns 1500
   patch('logger.info', 'fakelogger.info')
               # Doesn't matter
   json['time'] = time
               # Add key

   check code() == 1
               #That's what yay() will return and pass your test.

Remember, you could run your test without patching a single thing. As in, let all the 3rd party calls happen and test with real data, but that'll slow things down badly, specially if you have 1000s of tests to run.

And again, if you're still struggling (coz I did for a long time) it's a "unit" test - you isolate the bit you care about, assume everything around it works well, give it the input it needs to work well and then write your tests.

I hope that demystifies it a bit :).

Thursday, March 2, 2017

Debugging running web applications

I've always found it easier to test with code when testing *anything*. So when I see large web applications, my first instinct these days is to build it with code. Recently I was testing a web app that had its own built in web server. Here are the steps I performed to get it up and running.

1. Get a VM and configure the application inside the VM.

2. Identify how the server can be started in debug mode. Usually these servers have startup scripts and shutdown scripts. Look at those. Sometimes you get lucky (like me) and find a debug mode startup script. The debug script will contain a line like this, which adds to your command line arguments while starting the server (some long java command)

-Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n

3. Verify that port 8787 is actually listening on the server, along with the other normal ports on which your web server runs (say 80,443 for e.g).

4. If you're using Virtual Box, setup port forwarding so that you can reach all the ports on the guest (80, 443, 8787) from the host. Netcat into the guest ports from the host and verify that things work properly.

5. Now download IntelliJ on your host and make sure it launches.

6. Import your project into IntelliJ directly from the svn/cvs/whatever repository using the 'Intellij Idea -> checkout from version control' at the start. You might have to install the svn/cvs/git client on your host as well, if you do not already have it.

7. Now configure a remote debugging configuration in IntelliJ. Here is a great StackOverflow post to help you do this. Give it a name 'remote_debug_project' or whatever you want.

8. Click Run. Debug remote_debug_project. The lower pane should open up and say something like 'Connected to the target VM, address: 'localhost:8787', transport: 'socket'

9. Set a breakpoint in code that you know will definitely be triggered. Maybe choose the 'login page' verify password function.

10. Visit the web page. Login. The breakpoint should get triggered now. Enjoy a more powerful way of testing web apps now.

If your clients give you source code and you can build it, I definitely encourage this way of testing stuff. The time you take to find stuff out with code is way way lower than with black box testing.

Wednesday, July 27, 2016

SSL/TLS for the layman (Attacks) - Part 2

As with the previous SSL post, this post is NOT a replacement for reading any of the official papers on this attack. It is just my attempt to summarize all of these attacks into a form that is easily readable and understandable by anyone. For the gory details, search for the original paper for each of these attacks and read those.I'll try and link to as many as I can find here

And before we start, its a good idea to bookmark this post and the previous one. I will try and keep both these updated as and when newer recommendations or/and attacks come out. Of course I could miss things, and am reliant on anyone who chooses to trust this post - to tell me that I missed something. Then I can read, understand and add things here. Here goes then.

This is an attack (very similar to BEAST) that targets a property of 3DES and probably any other cipher (I think) that uses 8 byte blocks while encrypting plain text. There's lots of pre-requisites though, for this attack to be successful. Here is a nice read, that I thought broke things up really well.

  • A victim needs to be passing something confidential and which consistently appears in every request (like a session cookie).
  • The attacker must somehow trick the victim to visit a website under the attacker's control, that will make a very large number of requests to the target site.
  • The attacker must be in a position on the network to be able to actually capture all that encrypted traffic.
  • There should eventually be a collision (Apparently this can happen in 2^32 attempts. Please read the link above for why). In other words, bits of plain-text traffic, despite the encryption algorithm doing everything right, must look the same when encrypted.
  • The bit of confidential data should always appear in the same place, when traffic is encrypted.

If all these conditions are met, an attacker could steal the confidential information (session cookie) and impersonate the victim.

The fix is to start disabling 3DES ciphers on your servers or at least ensure that it is not a preferred cipher. Use AES. AES has 16 byte blocks, which means it will take much much longer to have a collision. Limit the amount of time a connection stays open, so even if an attacker *does* steal a cookie, it is of no use.

Beast is a client-side SSL attack. A server is never affected. Here is a link to the original paper.

A victim needs to be logged in to a target site and must have been assigned something secret - say a session cookie. A user then, has to visit an attacker-controlled home page. The attacker, while sending those requests out, must be able to somehow add data to every single request, just before the cookie. If the attacker can't do this, this is not a valid attack.

This page the victim visited then (maybe via JavaScript) makes a large number of requests to the target site, into which the victim is logged in. The attacker then uses these requests and aims at decrypting the secret cookie. This is an example of a chosen plain-text attack. An attacker can send a million plain-texts, get their ciphertexts, analyze them and try and break things.

The attacks described in this paper allow an attacker to obtain the cookies even if HTTPS is in use and the secure flag is turned on. The attack is an attack, because of how CBC works by default - every single cipher suite that uses CBC internally, is affected.

RC4 was suggested as a defence as it was a good cipher (at that time) that didn't use CBC. But RC4 is bad for many other (worse) reasons - some of which are described in my previous post. So don't use RC4 and ignore anyone who tells you to.

TLS 1.1/1.2 on the other hand is better as it protects you against BEAST by default. So if you have all your clients supporting TLS 1.1/1.2 and none that depend on TLS 1.0/SSL3.0, disable RC4 and go this route instead.

Heartbleed is one of the most well known recent attacks just because of how high the impact is. An attacker could, with no user intervention of any sort, connect to a server and extract secrets from the server's memory. This XKCD cartoon, I thought personally was brilliant in explaining the true impact of the issue.

These could be passwords, private keys, PII.. pretty much anything. Once that information is stolen, what can be done with the stolen information, depends on if there are other mitigations in place to protect users. For example: You could steal a root password for some server, but if you have NO way to reach that server, while it's still bad... there is a mitigating control in place.

The only fix is to apply the relevant patches for OpenSSL running on your OS. Hopefully its all seamless, and the libraries on your system will be patched during your 'automatic update' process, however that takes place.

Generally, the way compression works is that it finds repetitive patterns in data and replaces those with a much smaller index. Then when uncompressing, it does things the other way. Looks up the index, gets the original value and constructs the overall response. CRIME is an attack that abuses how SSL compresses these user requests.

The attacker needs to know that some part of a request is always going to be present. For example: She could predict that a request will always have one bit called Cookie: secret=. She wouldn't know what that value is, but the first part .. is always going to be there. So now, if say she can append a second Cookie: secret=, SSL compression would kick in.

SSL would notice that Cookie: secret= repeated itself twice and replace itself with a super-small token; say 0.The overall length of the request will hence go down - to say 25.

Now the attacker will send multiple requests via the victim's browser. Lets say this happens because a victim opens another tab, visiting the attacker's site. Also, lets assume an attacker can see the victim's traffic (well, encrypted blobs anyway)
Cookie: secret=0
Cookie: secret=1
Cookie: secret=2

... and so on. The moment the first character after the '='  matches, SSL compression will, instead of finding a match for Cookie: secret= ... find a match for Cookie: secret=2. And this means, that the length of the request will again go down. But crucially, it'll go down by a little more than for every other character. Meaning if secret=0 or secret=1 .. length will be 25, but if secret=2 .. length will be 24. Meaning, the first byte of the cookie is 2 :). And so on.

The best explanation for this attack was here. The fix apparently is to disable compression. Most vendors have probably rolled relevant patches out as well, so just apply those.

The core principle behind Breach is very similar to that of CRIME, exploits the facts that content is compressed. Only, it targets compression done by HTTP by default, not SSL/TLS as in CRIME. In other words, it isn't an attack targeted at SSL at all. Its just mentioned here because CRIME and BREACH tend to be mentioned side by side very often.

There's some pre-requisites for the attack. The server should use HTTP response compression (most do), user input (and preferably a secret which is the target) is reflected back in the HTTP response.

Similar to CRIME, an attacker is in the middle and can read encrypted blobs of user traffic, and can also get the user to visit a site under their control. Instead of CRIME though, where the attacker measured the length of the requests (which reduced, because of TLS compression) - the attacker measures the sizes of 'HTTP compressed responses' in this case.

And remember, the attacker's input has to get reflected back in the response, multiple times, for compression to even kick in. If it does, then he could send a large number of requests (very similar to CRIME) and observe the effects on the length of the returned (now compressed) response, thus guessing byte-by-byte.

There's no real 'one-size-fits-all fix' for BREACH. If CSRF tokens are the main target, since those are very commonly returned in responses - changing the token after every request, while invalidating the former ones.. will defend against this attack.

Do note that there are many other partial mitigations suggested as well as numerous complexities in the attack itself. The official site for the attack that was linked above is IMO the best resource for understanding this. I'd recommend reading their paper, instead of the presentation :).

This was a Mozilla NSS library specific bug. Alice can bypass signature verification for TLS certificates issued for specific websites. So, if the signature isn't verified properly it means that Alice can construct a certificate... another certificate for say mail.google.com, that looks exactly like the original one... without having the private TLS key for mail.google.com at all.

And once she has the malicious certificate, she'll install that on a website somewhere. And the website (the most common use case anyway) will look just like mail.google.com. Phishing website. And then somehow get people to come and visit that website.

That's not as simple as it sounds, because if you tell everyone to go to mail.google.com, they'll end up going to the real site and not Alice's fake site. So Alice then has to perform other MITM attacks like ARP poisoning, DNS cache poisoning, BGP route hijacking (all I can think of now :)) and somehow tell users to go to Alice's site, when they type in mail.google.com in their browsers.

This is a more detailed read on this attack, just jump to the Berserk section below. The fix really is to just ensure you have the latest security patches in place for all the software that uses the NSS libraries.

Padding Oracle
This is one of the most famous crypto attacks on which a number of posts have been written about. This and this are the best reads about it, that I found. Details are always important in any exploit, but since its likely that you want to know more about this than other attacks, just because its so famous, do read carefully.

In a nutshell though, an attacker sends a number of encrypted texts to the server, which decrypts all of them. That's fine, but the server also tells the attacker when the padding of the message is wrong. Meaning, the attacker can brute-force the entire message byte by byte, just based on that information leak.
There's a few key principles to internalize while you're reading.

1. We are NOT at any point, trying to predict the actual correct pad. We're using the property that it leaks if the pad is right or not, to calculate the actual plain text, little by little.

2. You assume that the pad is 0x1 to 0xF per block. Assuming that it's a 16 byte block. Its 0x1 when you're trying to predict the block's last character. Its 0xF if you're trying to predict the 1st character. Note again, you never ever actually know what the pad was.
3. Once you find out the correct character (say X) in the previous block that does NOT give you a padding error, for say, the last byte of the last block, ask the question - 'What's the valid pad here?'. Its 0x1. Now go and look at the CBC bit of the Wiki article. Stare hard at the diagram and repeat this step in your head a number of times, till you get this step. Do NOT go on till you get this.
4. What you get as a result of Step 3 is NOT plain-text. Its close, but it is NOT plain-text. You have to xor X with the result of Step 3 to get the actual plain-text.

The fix is to ensure that you do not tell the user if the pad is incorrect at any point in time.

This is far from a perfect description but its almost impossible for me to explain this any better in a smaller space than I have here. It is after all just an overview of stuff, so you can read some more. Like everything else.

Lucky 13
This is an attack that's very similar to the padding oracle I described above. The way the padding oracle is fixed is by not giving feedback to the user, by calculating a MAC and tell the user that their padding is wrong. So there's no way for an attacker to brute-force bytes and eventually guess plaintext.

So that got fixed. However, SSL/TLS still took different amounts of time while calculating the MAC, for differing padding lengths. And some smart people somehow (:-o) managed to figure that out, and decrypt entire blobs of plaintext again. Crypto's hard, isnt it? :)

Fixes of adding random delays apparently do not solve the problem. Another fix suggested was using RC4 ciphers, but that's a bad idea too as mentioned earlier. So, the only real fix for this is to ensure that people switch to TLS 1.2 and use the AEAD ciphers instead. This apparently is harder than it looks.

Most libraries that are vulnerable to this have fixed this now though, so make sure you have applied all the necessary updates.

Poodle (SSLv3)
The attack itself, conceptually is very very similar to the padding oracle attack. The only difference really, is that if clients and servers choose SSL 3.0 as their encryption protocol of choice, they are vulnerable to this attack.

In other words, not a single cipher suite of SSL 3.0 protects against this attack, and no one should hence be using SSL 3.0 at all.

I'm not going to describe the vulnerability again, since I went through it in a fair amount of detail earlier :). But if you'd like to read the original paper - I've linked it here.

Downgrade attack (TLS_FALLBACK_SCSV)
Now, after reading that brief note about SSL 3.0, one might say that SSL 3.0 isn't a preferred protocol by default at all. Which means that most clients will communicate using TLS 1.x. Which is probably true.

What's also true though, is that its possible for a malicious attacker to perform a downgrade attack against a specific client, and force them to use SSL 3.0, and then use the techniques mentioned in Poodle (or any other vulnerability really) and own them.

To mitigate this to a certain extent, there is something called TLS_FALLBACK_SCSV that will detect if an active MITM attacker is forcing users to use insecure protocols to communicate with servers. In short, the client sends an extra option TLS_FALLBACK_SCSV with its list of preferred ciphers to use. The server also needs to support this option.

Lets now assume that an attacker downgraded the connection to use a weaker cipher. The downgrade will work. The server will however (eventually) notice that the client set the fallback option, and abort the connection if a weaker cipher was used. This is one of the best reads I found.

Note though that this is just a patch. If the server supports SSL 3.0 and for some reason a client uses it to communicate with the server, without an attacker messing with the connection in between... TLS_FALLBACK_SCSV will NOT protect you. So disable SSL 3.0 on every single box you own. Go now :)

Poodle (TLS)
And guess what? The TLS Poodle is very similar to the padding oracle attack again, technically. In the previous attack, I mentioned that getting rid of SSLv3 would defend against Poodle. That turns out to be incorrect as you can see :)

The problem in this case, is that there are certain servers that for some reason use the vulnerable SSLv3 code. Meaning, I could use TLS 1.2 or whatever, but if that piece of code is borrowed from an old, vulnerable version of SSL, its still going to break.

And there are further variations of this attack as well here. And scarily apparently it's not very well known and hence all vendors might not even have patched this issue.

Anyway, as with everything else there's nothing you can do but patch your code if your vendor has released a patch.


This one is seriously creepy. Here you could think you are safe because you don't support a protocol at all - but guess what - you are still vulnerable. Its crazy and I never heard of anything like this.

Drown targets SSLv2. Here's Ivan Ristic talking about it as well, its a nice overview. A lot of servers don't support SSLv2. Browsers don't talk over SSLv2 by default. So the logical conclusion is that you're safe.

But let's say you've (for some weird weird reason) reused the private key to sign another certificate - on another server - and that server supports SSLv2 as well as export-grade ciphers (really old 40 and 56 bit key ciphers). Both servers can be owned by targeting SSLv2 on the 'bad' server. Let it sink in - currently the server supports *only* TLSv1.2, it's all useless if you have some box somewhere that has a certificate that was signed using a shared private key.

As in, you can attack an SSLv2 connection between a client and the 'bad' server. This attack gives you the session key used to encrypt that particular session. Using the session key, you can recover the master key. And once you have the master key you can generate session keys and decrypt *any* sessions - even the one between the client and the good server. Read this OpenSSL advisory for a brief explanation (search for 2016-0703 on the page).

And of course here's a link (its long but its by Thomas Pornin whose answers are unbelievably good most times) on how SSL works and how to decrypt captured, encrypted traffic using the master key in Wireshark.

And yes had you configured your server to only accept cipher-suites which support Perfect Forward Secrecy, that traffic would still be safe from attacks such as Drown.

This targets the fact that clients and servers for some reason still support export-grade cryptography. Which just means encrypting your communication with ciphers that are not very strong and can hence be decrypted. By default browsers don't use these ciphers at all, but remember - the negotiation is in plain text - an attacker can see what ciphers are being chosen.

Because of the bug, an MITM attacker can basically hijack the negotiation process - and tell a client (if vulnerable) to use export grade RSA (512 bit key which isn't strong despite it looking that way) to encrypt all traffic, until the symmetric key is decided. Remember, SSL uses public key (asymmetric) crypto to start off, decides a secret and then uses this secret to encrypt further traffic (symmetric).

Now since the public key stuff is all using export-grade RSA, an attacker can apparently 'break this' (factorization problem - not brute force) and retrieve the decryption key being used. Here's a brief overview article and a really nice article with a bit more detail on the same.

Once this key is retrieved, the attacker can decrypt traffic that passes on the symmetric key and decrypt everything for that session between the client and the server.

Now doing the 'breaking' (which really is solving RSA for manageable numbers) is not fast. So if you really had to do it per-SSL-session, it would still be a bug but less practical. But as per the wonderful Matthew Green, servers reuse a single export-grade RSA key when they start up - until they are shut down. So if an attacker manages to get one decryption key, they can get potentially capture traffic and decrypt it successfully for a number of sessions.

The fix really is to make sure that no one ever uses export grade ciphers again to do anything. This one sort of reminds me of how the NSA want to build back-doors in the IPhone today for 'good' purposes. Very eerie similarities if you ask me.

This is very similar to FREAK described above. It targets the exact same fact that export-grade DH ciphers are supported by clients and servers. Only it attacks Diffie-Hellman (DHE) - another public key encryption algorithm - instead of RSA. The Eff also has a good read.

DH is safe because it is hard to solve the discrete logarithmic problem and get back the initial numbers (exponents) chosen at the start. If those numbers can be predicted/obtained somehow - DH is breakable. But the point is, its very very hard (impossible as of now?) to do so.

The researchers who found LogJam though, did something cool. To understand, lets take an analogy. Think of a 25 character password. If you wanted to crack it using brute force it would be pretty hard and need lots of power and time. But if you built (offline) something called a rainbow table (which by the way is also very time consuming) which had every_single_possible_password for any 25 character long password - things change. The next time you get a 25 char password all you have to do, is look that table up.

Now back to Diffie Hellman, they basically precomputed the discrete logarithm (a hard problem) for 512 bit prime-numbers. Meaning, the next time they saw DH being used, with 512 bit primes, they'd know what exponents were being used, and hence find out the actual symmetric key being used inside. And as it turns out, a lot of servers use the exact same prime number. So if you solve it once, you've basically potentially all connections to those servers.

And the second fact, servers still support export grade DH ciphers. This fact and the rainbow_table_wizardry is what LogJam is about. An MITM attacker will downgrade a connection, force the connection to use 512 bit DH, and then use the rainbow table to find the secrets out.

The mitigation is to think of all the possible services on your servers that use Diffie Hellman. Once identified, ensure they are all configured correctly - namely disable export grade DH cipher suites, and ensure that 2048 bit primes are used to compute the shared secret. On the client side, stay updated - use newer browsers and hope that browsers reject connections that use 'export-grade DH'.

Lucky Negative 20
And here's the newest one as far as I know (please tell me if I am wrong) This one is another padding Oracle problem, where the server responds with verbose error messages on decrypting carefully chosen (by attacker) cipher text and finding something wrong in it. Based on that, the attacker decrypts bytes block-by-block. This is a problem when AES in CBC mode is used. The fix apparently is to (guess what) use AES-GCM with TLS 1.2.


By now, so much of crypto seemingly depends on AES-GCM, that I'm scared a new vulnerability will be discovered tomorrow in it, which will break the Internet again. And then what. Its pretty depressing if you ask me, that we've become so dependent on encryption as a solution to all problems.

And cryptography is hard, really hard - and very easy to get wrong even for people a million times smarter than me. And hence, I'm fairly nervous. And feel that its just a matter of time before I update this post again.

SSL/TLS for the layman (Configuration) - Part 1

There is nothing new in this post. I just got fed up of tracking all the SSL/TLS bugs that seem to come out every month, nowadays. I don't even remember which bug does what, how hard the exploit is and what needs to be done to fix it. And everytime I have to Google. Every. Single. Time. And I'm sick of it.

So, this page. Anything that is a problem with TLS configuration, that I'd report when I do a penetration test for a client, I'll try and briefly touch upon. If there's something new that comes up, I'll update this page. In a 2nd post I cover attacks. When an SSL/TLS attack comes around, I'm just going to update that page.

And no, I'm not claiming this is some pioneering, wonderful, one-stop page. It is just so I can go to a single place and remind myself on what some SSL/TLS misconfiguration is.

Weak SSL/TLS ciphers (Strength less than 128 bit)
The way SSL/TLS works is that the client and server negotiate what key size to use to bi-directionally encrypt communication. If both the client and the server have their first choice as a 40 bit key, it means that until negotiation happens again, they're going to encrypt traffic using a shared 40 bit key.

If someone gets hold of an encrypted traffic dump of this communication, they could go through all 2^40 keys, each time decrypting the traffic. When it makes sense, they'd stop and that's the key... for THAT session. Not for every session between that client and server.

That is still bad, and no one should even support ciphers that use anything less than a 128 bit key, which today is considered safe. Although this Wiki snippet says that this could be bad too, under certain conditions. For now though, its fine. Start thinking though, of supporting ciphers that are at least 256 bits long.

The only reason this is mentioned here, separately was because it was a very popular choice and the preferred cipher for all client-server communication for a long time. If I remember right, even Google had that as their preferred cipher for a long time. I'm "guessing" because it was a really fast symmetric key cipher.

Since then though lots has changed and numerous vulnerabilities have been discovered in RC4. So in short, if this is still a preferred cipher, its wrong. Please disable it on every host+service that supports SSL.

Also, thankfully browser vendors are also moving away from it, so that's good.

Really Really old protocols (SSLv1, SSLv2)
I don't think I've seen anyone use either of these protocols for a while now, even on test servers, but you never know. So yeah, just just disable support for both these. Its ridiculous to have support for either of these.

SSLv1 was apparently never public, which explains why I always saw only SSLv2 enabled even on badly broken servers :)

SSLv2 had numerous vulnerabilities which have their own RFC, which is a fairly strong reason to completely avoid it. Jump straight to Section 2 of the RFC to see some of the vulnerabilities. These affect Confidentiality, Integrity and Availability for any client-server traffic. That's bad.. please never use it.

MD5/SHA1 signed things
A signature for any digital content (like certificates) means that you trust the person who signed the content. So no 2 messages should ever have the same signature. Because, if they did - it'd mean that the person reading the message wouldn't know which message is the real one. Since both appear to be signed by the same person. And could be valid...

Meaning any hash function, where it is possible (practically speaking) to find 2 messages that have the same hash at the end ((hash-collisions) should be avoided. And there's a known attack against MD5 that does exactly this. And its possible to do this with SHA1 as well.

Migrate to SHA256 or SHA512 for all your signing needs at the earliest.

Insecure Renegotiation
Renegotiation in an SSL/TLS context effectively says, 'Can we please decide again...what protocols and ciphers we're going to use to communicate?'. If a client can do this and the server supports it, it means a man-in-the-middle can inject traffic (details here or originally here) and force a client to renegotiate.

Then by getting the timing right, they could somehow piggyback on a client's request, use the client's credentials and do things that the client can do.. without the client's knowledge. Can't ever see a thing but can still do it. Sort of like CSRF inside an SSL/TLS connection.

There's also another exploit discovered that lets attackers DOS the server. There's also a tool by THC to do so. It's a bug, sure but doesn't fix the overall DOS problem as such. It defends against 1 technique. That's it. There's even arguments that this is by design.

The fix is to not let the client renegotiate at all. If a client does try, the server should just reject it. The process is different for different servers.

Lack of OCSP stapling
Clients need a way to decide whether a certificate is valid or not. If it isn't, the client shouldn't let a user visit a site. So, initially there were revocation-lists (CRLs). But those were updated only once every X days, so there was always a chance of a user being owned, in-between. Hence OCSP where the user verifies if the certificate is valid, before connecting to it.

The problem with this though is that the destination server, specially if its very busy could get overwhelmed if clients keep hitting it to check certificate validity all the time. And it consumes client resources as well.

Hence OCSP stapling. A server will do the querying-for-cert-validity and return the stapled response, when the client queries it. So no runtime queries from the client directly to the guys who signed the certificate. There isn't a security issue here, unless you think of a DOS against the guys who signed the cert.

If you can though, specially if you have a low capacity internal CA server, its good to enable OCSP stapling.

Lack of HSTS
HSTS if configured right on the server, tells the browser to always send traffic over a HTTPS connection. Its relevant only if the server supports both HTTP (for some legacy reason) as well as HTTPS. So, if its done right and I type http://site the browser will look at the HSTS header and force the traffic over HTTPS. Which is great.

Note though, that the absolute first request before the HSTS response reaches the browser is still over HTTP, and hence open to an attacker man-in-the-middling the connection and owning a user.

And that's not good either, which is why you should try and notify browser vendors in advance that you plan to implement HSTS. Or at least as soon as you implement it, to reduce the exposure to this attack.

If you only support HTTPS and port 80 isn't even open on your web-server, you can still configure HSTS, but it doesn't improve your security in any way.

No Forward Secrecy cipher support
Lets say an encrypted traffic dump is captured. Now that communication was encrypted using a specific session key, generated using the server's private key...that's how SSL works.

So if the server's private key somehow got stolen, the attacker could use it to decrypt all the traffic she captured earlier.

Lets now say that a client-server communication is encrypted using a cipher that support forward secrecy. Even if the private key is stolen, the attacker can't use it to decrypt any of the encrypted traffic she had.

So ideally, ensure you support some strong ciphers that support forward secrecy and make one of them your preferred cipher of communication.

Thursday, May 26, 2016

Ack, ag... not grep

Grep is a great tool and anyone using any *nix system uses it all the time. In fact you could also use it with Cygwin on Windows. But recently at work, a colleague pointed out that ack and ag are great, much faster tools with better defaults than grep. So while I still go to grep once in a while, ack/ag is my search tool of choice now when doing code-reviews. You can find ack here and ag here. ag is very similar to ack, just even faster (Disclaimer: Only tried this once on my last gig)

While the man page is always a good option, here are some of my favorite switches to get you started.

--java or --ruby: Will limit the search to just files of that type. Meaning it'll ignore a lot of libraries, documentation and other large stuff that happens to be in the same directory

-G cpp$: Search only in files ending with cpp. This one's nice coz if you do --cpp it will search cpp and .h, but you don't want to search inside header files.

--ignore=*.h: Search everywhere except .h files

Recursion: Its on by default, which is almost always what you want in a code-review.

--ignore-dir: Its common to find 567 hits of a function in a source-code-tree and realize that 560 were in a directory 'abc' that you didn't want to search at all. You can use this switch to completely ignore that directory.

--ignore-file: Same as above but for a file. If you have a file in 100 different directories that's matching, but you want to ignore, use this.

-l: This one's similar to grep I think, if you don't want "what" was matched, but where it matched it. Just lists which files matched a pattern.

-i: Same as grep. Case insensitive

-v: Invert match. Same as grep. Case insensitive

The responses for ack were nicer too and more intuitive, I thought. Some of my favorite things:

- Listed filename only once and then all the matches under a file. Grep prints the filename_per _match
- Breaks between files by default. Grep dumps them all together.
- The colors of the response were nicer (IMO)

Also lastly, it appears to have the option to use a .ackrc file to set your options. So then you can just type ack and all the options you set in .ackrc will get picked up on their own.

I mean, you can probably do all this with grep as well, so it isn't meant to say.. 'hey no more grepping for me' .. just that it is a really nice tool that can help in some situations.

Tuesday, November 17, 2015

Respect for developers

At work, we tend to spend time that we’re not on projects however we want. Ideally, however meaning - technically however – not being on Facebook 8 hours a day. But jokes apart, usually someone picks up a skill that he/she isn’t familiar with and tries to learn the same when not on projects. Since most of my professional life in information security has been to break things, I’ve always been curious on how things are on the other side – to build things. What challenges does a developer face? Why is there still such a lot of truly awful insecure code still out there? This despite there being tons of resources to learn from? After all, as a hacker that’s how I’ve learnt all my life – don’t know how to hack a new technology? Go read. Learn. Master. Hack. Why should it be any different from developers?
And so, I decided to fix a ton of bugs in Whetstone, our internal appraisal system which our boss wrote while he was on paternity leave. He takes his vacations seriously, as you can see. Whetstone was written in Ruby on Rails – which is one of my favorite languages for web development – just because of how easy it is to get started. So I go, ooh nice – how hard can it be? And that’s where it all began going wrong…
So obviously, I can’t gut the whole thing and start re-writing the entire system. I have to fix bugs that currently exist. I launch GitHub and see 78 existing bugs. So now I have to prioritize which ones to fix first. I start the process. I pick a simple one to start off with and quickly fix it. I’m excited. I want my change to be visible immediately. Oh but wait, that’s not how it goes. Apparently, git has something called branches that I first need to learn about – so as to not mess with other people pushing code back as well. That puts and end to my coding for the present, and I spend 2 hours reading about how Git handles versions and branches and why its better than SVN and so on. Then I push my code and see it appear on Github. But then someone has to “pull” my changes or accept them. And clearly, while I have permission – it makes no sense to approve my own code – that defeats the purpose. And boss is on leave. No push. No warm fuzzy feeling of first ever fix. Process problems.
Anyway, I’m now feeling better and pick up a few moderately complex issues. It seems I know how to fix it. Code. Compile. Fail. WTF. Google. Stack Overflow. Fail. As it turns out, I’ve upgraded Rails to 4.3 or something and all the fixes on the Internet which have up-votes on Stack Overflow all fail. Somehow after a lot of trial and error I find a fix, but it has taken way way longer than I’d ever expect it to take. And note, this is a simple internal application. If I fixed bugs at that speed in production, as a developer – I’d probably be fired in a week. Skill problems.
Then, I get put on a billable project and forget all about this for a bit. Boss comes back, my 4 changes are accepted. Am I having fun, he asks? Yes yes of course I say, drunk with the success of my 4 massive quashed bugs. Okay, have fun – fix some more says boss. So 2 months later, I pick it up again. So let’s start now. Er, why did I write this code this way? F***, it was so long ago – I should have commented. Lets look at some old code, maybe that explains it. Er no, no comments there either. Just some fancy one-liner SQL looking query. Ha, I’ll just comment the old code then and write new fresh code! That’ll fix it. For sure…… 2 hours later. Undo. Undo. Undo. Okay that broke everything :| and clearly I can’t code. I suck.
And now, after all the undoing, something else isn’t working too. Screw it. Let me revert to a clean state. Let me just delete that stupid whetstone_old directory that I created. And I’ll rebuild everything from Git again – from my old state. Delete. Rebuild. Ugh. Read Me isn’t good. How the hell did I build it last time? What does this error even mean? Note here, that I have NOT fixed 1 single bug yet… 1 hour later – AH so all you needed to do was change the config file entry? Okay. Anyway, let me first collect all my old notes from whetstone_old before I forget stuff like this. Don’t want to waste time.
Eh. Where’s whetstone_old?? Oh no!!! Don’t don’t tell me it was in the directory I deleted #-o. Yes. It WAS in the directory I deleted. Woo Hoo. All gone. I’m screwed. Backups are important. There’s a reason you backed up to whetstone_old. Why would you delete it?
Another day wasted then. Okay okay, now lets fix bugs. Ah, here’s an easy bug – “Display last login time for a user”. 10 minutes. I got this. Just print a date out in the view. Adds date to view. But this is just printing the current date each time ffs :-o. We want it for each user. Oh. That means a new column. In the user database. I know a little MySQL though, from all my SQL injections over years gone by, so this should still be quick….
Error. Can’t connect to MySQL database. Eh? Different port? No. 15 minutes. Can’t figure out where the DB is. RTFM. Sqlite. NOT MySQL. Now if I want to debug anything, I need to learn Sqlite querying. Learn how to use a database. NO ffs I have still not fixed the bug. Okay, now I understand sqlite. But how do I add that column? Learn Rails migrations. Wow. Another half an hour. Add column. Finally fix bug. It lunch time. I should stick to hacking things – I clearly am not good at this stuff. But wait, now I got it.. I know how the DB works, so things should now work out...
Okay, lets login and see if our date prints right. Just to check lol. Obviously nothing can go wrong. Hmm, looks okay. Oh wait, we need to do some stuff with the Date. Why do none of the in-built functions work? Oh no, its not a varchar – its DateTime – only then will some functions work. Or I’ll have to write my own functions. Which would be stupid for such a trivial task. So I have to change the data type in the DB. Learn how to change Rails migrations. And while that looked simple, it didn’t work. The accepted solution was to “DROP TABLE” and “Recreate TABLE”. Ugh.
Now remember, I hadn’t created the table in the first *^%$#@! place. So I now have to find out the structure of every single column of that table and re-create it. Screw it. I don’t want a fancy date. I’ll just leave it as it is and go on… But now that failed migration attempt has screwed something else up – and all my normal code that I never never touched is not working. Fantastic. Things you never coded can break despite you never touching it.
Meaning, I now have to learn the entire old schema. Luckily logging was turned on by default, and the old old old log file was never truncated. Extract the old old query. Convert it into a valid Sqlite query. Drop old table. Create new table. Okay error gone. But now no date display. WTF!!! That’s cause recreating the table destroyed my Rails migration. So do that again. Finally done. In short, playing with the DB isn’t particularly fun, and any re-architecting brings up weird weird problems.
And after all this, all I’ve done is fix one lousy feature request that displays the last login time. Okay let’s move on to the next bug. It’s a security issue now, user IDs are generated sequentially and that’s a problem – a hacker could enumerate all valid numbers. So I need to randomize stuff. Meaning I have to change the data type of a column. Right now, the id column is an integer, I need to convert it to a GUID. Looks very similar to the previous DateTime problem haha. I got this one for sure. Migrations. Change. Blah. . Er. Why did something else break again? Why is nothing working now? Why can’t I even login? Panic. Turns out that id was the Primary Key for the DB. And messing around with the primary key is a bad bad idea. Even worse than messing around with a database. Don’t do it. Just just don’t do it. Deny that feature request. Learn how to change a primary key.
This is terrible. Is this what fixing other people’s code is like? And this is for a small internal application which is relatively well written, compared to some of the bloated code that I’ve seen over the years. Anyway, more hours wasted – I figure out what to do. But now, for some reason I can’t find out where the hell to put the ‘randomizing code logic’. A few more hours… oh wtf.. you know by now that I’m spending more time fixing stupid knowledge gaps than actual bugs.
As it turns out, my boss (correctly) decided to not write a line of code for authentication, and relied on a third party Ruby Gem instead. Well whoopee doo doo. #-o Why am I sarcastic, you say. It means that I now have to hack on some third party module code instead. Start learning third party module code now. That’s another fail, and I have to contact the module owners who patiently explain what to do. Eventually it all works.
Lets look at Bug 3 now. All you need to do is to make a really simple UI change. This CANT be hard ffs. How hard can it be? But by now, I’m in a numb state – coz I’m fairly sure something will go wrong. Oh look. Its CSS. And CoffeeScript. Do I know either? NO. Learn CSS. Learn Bootstrap. Learn Coffee Script. The saddest part is I just need enough to fix a tiny bug, but without knowing the basics – I can’t do it. More time gone.
And this pattern repeats.. again and again and again. Eventually of course, as expected (else I won’t have a job) I improve and start fixing things quicker. But something or the other always always breaks – when you least expect it to. And when its not your code, its harder to fix it. And now I think of those massive banking applications, when the chief architect and the lead developer quit a while ago, and there are 5 new developers. Shudder.
And I still have tons of bugs left. I find my mind thinking of shortcuts and dirty ways of solving problems all the time. I find myself thinking of how to somehow get that number of bugs down. Somehow. Quickly. So I can get back to actually writing new cool stuff. It isn’t easy to maintain that mental discipline. Honestly isn’t. And this is for an internal app, which isn’t critical and with no deadlines except those that I set for myself. I have a new found additional respect for developers.
I always knew a developer’s job is trickier than that of a hacker – and I’d like to honestly say I’ve always been respectful to every dev I have met. But I’ve never sympathized with one. I’ve always felt that it is yet another job that one learns to do well over time. And that’s true, sure – but they do work under greater pressures than I do. There’s other skills in my job as a white-hat hacker, other traits that maybe developers don’t need to learn – but whatever a developer does need to do – isn’t easy. IF IF you want to do it well.
Lastly, I want to tell every white-hat hacker to put on the defensive hat once. Actually code. Code things that people will look at and break and tell you how awful your hacks are. And how you should test more… before releasing to production. See how it feels. :) If not anything else, it’ll make you respect the developer community more than you already do. And that’s how it should be.

Tuesday, February 10, 2015

Using the Call stack to debug programs

In large pieces of malware it is difficult, when under time pressure to fully reverse it. So, many times you just put a breakpoint on the imported functions like say send() for outbound TCP connections. Then you run the malware.

What happens now, is that the BP will be hit when the malware tries to send traffic. And that's fine. But where was it called from? As in, which function actually made the call to send()? This is important because it'll help you go back and then find out how the payload was constructed, how the C&C was decided... and many other things.

There's 2 ways to do this. One, you can identify all the references made to the send() call by the binary, and set breakpoints on all of them. Then when one gets hit you can inspect it further. This is the most obvious way to do it.

The other way is to first, as usual set a breakpoint on the function you want to trace. Here I just choose SetUnhandledExceptionFilter() as that's the first function I can see :). The address immediately after that is 0100251D - make a note of that.

Then, instead of searching through the 123 references to the call and trying to guess which one is useful, look at where the function is returning to when the  breakpoint is hit. This can be found on the Olly call stack. Many times you have to step through a lot of system function code, but eventually, it will return to user code. I tend to use the Ctrl+F9 while in system code, and keep watching the call stack for when the address changes to one where the code is running from.

Then I visit that address, set a breakpoint and hit F9 again. This time I will break inside the exact place where the call was made from. Note the address? :)

Then we go to that address and set a breakpoint and run...and we'll break in the exact place where the call was made.