Tuesday, November 13, 2018

Content Security Policy - Quick Reference

This is a post to help me remember the various parts of CSP. The w3 specification for CSP is very readable - this is NOT a replacement for them - just something to help me remember the directives :)

Here's a nice link where you can generate your policy bit by bit.

Remember, by default content is allowed to run on the web - not blocked. If browsers made the defaults as 'block all', I'm willing to bet a lot of issues would go away.

Don't use:

- unsafe-inline: Allows inline JS (includes javascript:) to be run, this is where a ton of XSS happens
- unsafe-eval: Runs eval() on any JavaScript user input that is passed to it
- data: The 'data' tags allow content to be encoded as text/html or base64 and are another way of delivering inline content


- child-src: Controls where <frame> and <iframe> can be loaded from
- connect-src: Controls where you can make direct connections to web-servers to (fetch(), WebSockets, XHR, EventSource)
- default-src: If the site uses JS and you haven't whitelisted any sites, it'll look at what's here and try loading a script from here. This is the default for every other fetch directive. Starting with 'default-src: None' is a good idea to start white-listing content
- font-src: Where can I load fonts from?
- frame-src: Where can I load Iframes from?
- img-src: Where can I load images from?
- manifest-src: Where can I load app manifests (metadata about a specific application) from?
- media-src: Where can I load audio, video and subtitles from?
- prefetch-src: Where can resources be prefetched from? This just means that some resources on the page will be 'processed' (DNS resolution for example) before they are actually requested
- object-src: Where do plugins (embed, object, applet) get loaded from
- script-src:
    * A list of white-listed sources for Javascript.
    * 'self' indicates that the browser should load scripts only from the site itself and nowhere else.
    * This controls inline scripts as well as XSLT stylesheets that can trigger script execution.
    * Adding 'nonce = really_random_nonce' or 'sha256-hash' can allow very specific inline scripts if there's no way to whitelist inline scripts
    * strict-dynamic accompanied by a nonce for a script, means that any scripts recursively called by that script are automatically trusted, without needing a nonce or hash themselves
- style-src: A list of whitelisted sources for CSS
- script-src-elem, script-src-attr, style-src-elem, style-src-attr all similar to script-src and style-src, except that they allow blacklisting specific tags instead. Not yet in browsers though, but here's a Google Group post.
- worker-src: Where can I load background Web Workers from?

Document directives:

- base-uri: Controls where relative URLs can be loaded from
- plugin-types: Restricts the types of plugins that can be loaded into the document
- sandbox: Controls what the IFrame that's embedded in your page can do. You can allow scripts, popups or forms for example

Navigation Directives:

- form-action: Submit forms only to specific whitelisted URLs. Useful when an attacker can actually inject their own form tags
- frame-ancestors: Defends against clickjacking attacks by limiting the websites that can actually frame the target site using frame, iframe, object, embed or applet tags
- navigation-to: Limit the websites that a page can navigate to

Reporting directives:

- report-to: If CSP is started in report-only mode, where do you send the report violations

Other important directives:

- upgrade-insecure-requests: Upgrade all requests made over HTTP to use HTTPS

- block-all-mixed-content: Ensure that all resources are requested over HTTPS, as long as the page is loaded over HTTPS
- require-sri-for: Subresource integrity for all scripts requested from a third-party-domain to detect tampering on the way

Other directives:

- referrer: Sends referrer only under certain conditions
- reflected-xss: Controls features in user-agent to prevent xss

Thursday, October 4, 2018

SSH certificate authentication


* You can configure client-side and server-side authentication using SSH certificates with the existing openssh daemon.
* You never need to worry about MITM attacks on the client when connecting to the server the first time
* Significant decrease in management overhead of SSH keys on the server

If you have a remote server to manage and it's running Linux (or even Windows for that matter but that's beside the point) - it's very likely that there is an SSH daemon running on it. You use an SSH client to connect to it and perform administrative tasks on it. While doing so, you can use passwords (by default) or public key authentication which is a bit more secure as it takes out the password-brute-force attacks. It does mean though that there is some management overhead on both the client and the server side.

On the client, you have to add the host that you are connecting to your known_hosts file. So over time, you have a massive list of known_hosts with no clue about the purpose of each host. Similarly on every server, there is a huge authorized_keys file which has the client's public keys added to it. When you want to revoke a client key you have to go in and remove that client's key from this file on every server. When you want to not trust a server any more, you need to remove that entry from your known_hosts manually. This is something that can go wrong easily if you miss one server - so there's probably some automation that is probably required here that can make it more reliable.

Certificate-based-authentication goes one step further, where a client trusts any SSH server signed using an 'SSH-root-CA' and a server can in turn trust a client key only if it is signed by a 'user-CA'. There is a really nice post by Facebook where they automate this process and make it even less error-prone. Those posts do a good job of walking you through step-by-step but I did have trouble replicating it, so I'll do a quick summary of the exact steps here.

Server certificate authentication

1. Configure an SSH daemon on a server (Docker, EC2, VirtualBox doesn't matter - but ideally a separate host as it's the CA). Let's call it ca.
2. Generate an SSH keypair for the server CA.
3. Start a new server up. Let's call it host1. This too should run SSH. This is the server you want to login to and administer.
4. Generate an SSH keypair for host1 in /etc/ssh
5. Copy host1's public key onto the ca server. Sign host1's public key with ca's private key. This will create an SSH certificate.
6. Copy ca.pub and the certificate you just created from ca to /etc/ssh on host1.
7. Configure /etc/ssh/sshd_config to use the key you created in Step 4 as well as the certificate. This is done using the HostKey and HostCertificate directives.
8. Restart the SSH daemon or reboot your server to reload your SSH config so it uses the certificate
9. Configure the client machine (any machine apart from host1 and ca) to recognize the ca's public key using the @cert-authority directive. This is so you don't get a 'Should I connect? Yes/No' message the first time you connect to host1.

User certificate authentication

1. Generate an SSH keypair for the client. This is the userca.
2. Generate a second SSH keypair for the client. This is the key you use to connect to host1. Call it client.
3. Sign client with userca. This will generate a cert as well on the client.
4. Copy userca.pub to host1 and configure sshd_config using the TrustedUserCAKeys directive pointing to userca. This is so host1 recognizes that all user certs signed by this cert are to be accepted.

At this point, you should be able to login to host1 from client and never get a popup the first time I connect because I've explicitly trusted the server CA. It's also very cool that there is no need to do any more key management on any server, as long as you trust the user CA used to sign the user keys.


Dockerizing an SSH service
Hardening SSH

Tuesday, September 18, 2018

AWS - Developer Tools

This post summarizes the AWS services that are used to help you write code and reliably build, test and deploy it faster that things would be manually. The overall concept of doing all this automatically is usually summarized as Continuous Integration Continuous Deployment. Here is a simple post that nicely explains these concepts.

If you don't want to read any more the tl;dr is this:

* Write code using AWS Cloud 9 
* Debug code using AWS XRay
* Store code using AWS Code Commit
* Build and test code using AWS Code Build
* Deploy code using AWS Code Deploy
* Watch task progression at runtime from a single interface using AWS Code 
* Use an integrated dashboard for all your tools including issue tracking using
   AWS Code Star.

If you're not familiar with Git, I'd strongly recommend reading a little about it before proceeding and playing with all these shiny new AWS tools. A great source is this chapter from the ProGit book. Once that's done, come back here. It's fine to read through this post as well, even without Git knowledge - it's just easier with that background knowledge.

Cloud 9 IDE

Once you have an idea in mind and want to write software to actualize it, you need a place to write it. A simple text editor works just fine, but as your programs get more complex an IDE is really helpful. A couple you might be familiar with are Eclipse and IntelliJ. However, since this post is about AWS, I must mention the Cloud9 IDE. It is a browser based IDE that gives you the familiar environment. I haven't played with it too much, but it's good to know there is a web-based option now.


This looks like a code-profiler to me. I did not use it so do not have much to say about it. But I'd think the way to use it, will be to write your code and use this to figure out which calls are really slow and see if you can optimize your code further. All the rest I did try out and can confirm they are all very cool tools. So read on.

Code Commit

Once you finish writing all your code, you need a place to store the code. This is where all the VCS come in. Git is what everyone use these days. The AWS equivalent of Git is CodeCommit. It's so similar that you do not need to learn any new commands. Once you've set your repository up, all the old Git commands work perfectly well. You can add files, commit them and push them to your Code Commit repository.

All you need to do is install Git on your machine, create a key pair and configure your IAM user to use this to authenticate to Code Commit. Clicking the "Connect" button inside the interface gives you instructions per platform if you get stuck.

The coolest thing here is that you can create triggers that'll run as soon as you push code to your repository. Maybe you want to build, test and deploy your code to your test environment as soon as every single commit is pushed. You can do that here by setting up a Lambda function that will be called as soon a commit is made. Which nicely flows into Code Build..

Code Build

Once you have a workflow going where you can write code in an IDE and push commits to a CodeCommit repository, the next step is to make sure that your code builds properly. This is where CodeBuild comes in. All you do is point CodeBuild to the Code Commit repository where you stored your code and tell it where you want to dump any output artifacts of the program (usually S3).

It supports branches too, so you can tell it which branch to pull code from in Code Commit. You select your runtime environment, which you need to build code in (Java/Python/whatever), configure a bunch of other options and then build your project. The result is whatever you get after you hit Code - Build in whatever IDE you use.

The big advantages here are that you do have to spend very little time configuring your software development environment. Also, like I touched upon a bit in the Code Commit section, you could have that Lambda function you wrote as a CodeCommit trigger automatically run Code Build against your code each time a commit is made.

Code Deploy

Once the code is compiled, tests are run and your entire project is built, the last step is usually to deploy it to a web server so your users can then access it. That's where Code Deploy comes in. You can configure it so it uses the build output (with a deployable project) and puts it onto every web server you want to have it on.

You have options of using a load balancer as well, if you want traffic to be evenly distributed. Once deployment is complete, the output should appear on all the servers in that group.

Again, remember you can further extend your Lambda function to build and deploy now as soon as a commit hits Code Commit. Pretty cool :)

Code Pipeline

Code Pipeline isn't something new but it certainly makes life much easier. It helps though if you understand the the 3 previous services I talked briefly about earlier - since the screens in Code Pipeline deal with these 3 services and ask you for input. So I'd recommend understanding those Code Commit, Code Build and Code Deploy really well before using Code Pipeline.

Pipeline basically is a wizard for the other 3 services. So it'll prompt you to tell it where your code is (Code Commit) , what to build it with (Code Build) and what to deploy it with (Coce Deploy). If you already have roles and resources set up successfully when you played with the other 3 services - this should feel very intuitive when you do it. A couple of great tutorials are here and here. Also, a nice writeup on how someone automated the whole process is here.

The coolest thing about Pipeline is that you can see everything, stage by stage and where each stage is once you create it. For example: Once your code is pushed to Code Commit (as usual) and you have the Pipeline dashboard open, you can actually see each stage succeeding or failing, after which you can troubleshoot accordingly.


Managers should love this one. I used it just a bit but it has this fantastic looking dashboard that gives you a unified view of every single service that you are using. So in short, it has links to C9, CC, CB, CD and CP. So if you didn't cheat and did everything step by step :) you should see all your commits, builds and pipelines by clicking on the buttons on the fancy dashboard that is CodeStar.

The additional feature here is integration with Jira and Github where you can see all your issues as well.

So in short CodeStar is a one stop shop if you've bought into the AWS development environment and want to be tied into it for years to come, while parting with your money bit by bit :)

Friday, September 14, 2018

Sample Architecture - AWS Example

A quick post this time on how you can use the AWS CLI or SDK to create an entire network, without using the GUI wizards (which are great, but sometimes irritatingly slow :)).

Relevant code to do everything in this post and a bit more is all uploaded here.

First up, almost certainly you want a VPC, because some services are public and some are private. A VPC will help you separate these. So you use the CreateVPC call to create one.

Make sure you enable private DNS so your external clients can reach your private hosts.

Then you create public and private subnets, so you can put your public and private hosts into each of those.

Your public subnet needs an Internet gateway to talk to the Internet, so you create one and attach a gateway to it.

Once you have your VPC, subnets and Internet gateway ready you need to setup routes between them. The wizard would do this automatically but we have to do it manually. So you first create a route table for both subnets and add routes to each route table. Note here, that you don't need your private subnet hosts to talk to the Internet. If you do, for some reason you will need to create a NAT gateway in the public subnet and modify your routing table in the private subnet to send traffic to it.

Now everything is sort of setup. So you then think of access control everywhere. For starters you create a security group allowing only inbound SSH and HTTPS access for an EC2 instance in the public subnet and only MySQL access for an RDS instance in the private subnet.

Create a key pair (I reused an old one is this was just a test) so you can use it for your new EC2. Identify an AMI to run on your EC2 instance. I used the console for this but you can apparently use the CLI or the SDK to find this out if you want to.

Once that's done you launch an EC2 instance in the public subnet, with the SSH-HTTPS security group and your key pair. Make sure you assign it a public IP otherwise you won't be able to reach it. Login to the instance with your keypair and confirm access works.

Now you start thinking of things you want to keep in your private subnet. The 3 things I was working with were RDS so my EC2 could talk to it, Secrets Manager to store my RDS credentials and a Lambda function that is needed to rotate the credentials in SecretsManager. All of these should be in the private subnet.

A cool thing here is that you can create a private endpoint for SecretsManager so that all traffic to it is always over an AWS network and doesn't go to the Internet at all.

RDS only needs inbound access from EC2 and Lambda on port 3306. I'm not sure what SecretsManager needs but I gave it inbound 443 only (You should test this more). Lambda doesn't need any inbound access. Setup security groups similar to how you did it before.

Create a secret in Secrets Manager. Use a random name if you're testing, you can't reuse old names for a while, even if you have deleted the secret. This secret should contain all the information you need to connect to the RDS database and used when you actually create the database.

Create a DB subnet group, retrieve the secrets you stored earlier from secrets manager and the security group that you created earlier (*3306 inbound access*) and then create the actual RDS itself.

Once the database is created, the only task remaining is to create the Lambda function that will rotate the credentials for you in Secrets Manager.

Saturday, September 8, 2018

Confused Deputy

The confused deputy problem is one of the best named issues. Not for any deep philosophical reason, but just because it is truly confusing :). To me anyway, but then, most things are confusing to me, until I spend way-above-normal amounts of time re-reading and re-writing it in my own words. The link above (AWS) is an excellent resource, which I learnt most of it from, so go there first - and if you find that confusing, come over here and I'll try and explain it in my own words. As always, there's nothing wonderfully new here - just my attempt to make sure I remember, have fun writing and hopefully help anyone else along the way.

Let us just keep it simple here. The 3 people in question are Alice, Bob and Eve. Alice has software called MyBackup hosted on the cloud that lets you back up your images that are stored in the service called MyImages. Each time you use Alice's software you have to pay her 100$. Sure that's ridiculous, but stick with me. For some reason Bob thinks this is a great idea and pings Alice to use this service.

Alice creates an account and gives him a unique string called BobAliceBackup1987. She says that all Bob needs to do is to login when he wants to backup, paste the string into a text box on the website and click "Create Backup". This will automatically (details are not important here) let Alice into Bob's account and copy them all to her secret storage box that is very hard to hack and send Bob an Email when it is all done. Don't think about how lame this system is at this point :).

Eve now hears that Bob is using this service and likes it a lot. She subscribes to the service too and gets her key EveAliceBackup1991. Everything is good and everyone is happy.

One day Bob and Eve have a fight and stop talking to each other. Eve feels that Bob is wrong and wants to teach him a lesson. Frustrated, she logs into MyBackup to look at her backups. (WTF who even does this??). While typing in her "secret string" she suddenly wonders if she can make Bob spend his Britney Spears concert money on Backups instead. Can she predict Bob's key? Will Alice find out? Only one way to find out...

She guesses Bob's key (what a shock :/) and sends that key to Alice. Alice hasn't spent much time developing any kind of authorization models, so all she sees is a string come in and think - well there's another 100$ for me :). She just assumes (pay attention here) that whoever sends the string is the owner of the string and actually wants to back their images up. And she backs Bob's images up, 20 times in a row without thinking that something's wrong. Bob gets back at night (no there are no Instant Mobile alerts here for payment debits) and finds out he has backed his stupid car_bumper_dented images up 20 times. Alice is no help, she has proof he sent a string...and sure enough when Bob logs in and checks backup history he sees 20 requests too. Meanwhile Eve feels vindicated. Eventually she might get caught, eventually Bob might get his money back and eventually Alice will learn to write better software but that's beside the point. And yes, it's a made up example but one that hopefully helps you understand the point of the attack better.

In a nutshell, confused deputy occurs when a service with multiple users makes a decision based on user input that is predictable without asking for further authorization. In AWS world, the predictable input is a Role ARN that a service can assume in your account to do something in it. While it looks really big, it is not considered secret and if someone guesses it, they can make a service do things in your account - without your permission. Does that make sense? I hope so. But if not...

... go and read that excellent AWS blog again and see if it makes more sense.

Wednesday, August 22, 2018

Serverless Development

Just another post to solidify concepts in my mind. The Serverless word is often used these days in conjunction with development. All it really means is that you do not have to spend time configuring any servers. No Apache, Tomcat, MySQL. No configuration of any sort. You can just spend your time writing code (Lambda functions). Mostly anyway :)

The most common use of this philosophy is in conjunction with AWS. As in, you create a configuration file called serverless.yml that follows CloudFormation syntax. This basically means you create a config file offline with references to all the AWS resources that you think you will need (you can always add later) and then upload that file to CloudFormation.

CF then looks through the entire file and creates all those resources, policies, users, records, functions, plugins and in short whatever you mentioned there. You can now launch a client and hit the deploy URL and can invoke all the methods you wrote in your Lambda function.

There are some clear instructions on how to deploy a Hello World as well as how one can write an entire Flask application with DynamoDb state locally and then push it all online to AWS with a simple sls deploy command.

All you need to make sure is that you have serverless installed, your AWS credentials configured and access to the console (easy to verify things) and things will go very smoothly. Of course there are going to be costs to all this - so make sure you do all that research before getting seduced by this awesome technology :)

Sunday, August 19, 2018

Birthday Paradox

There's a million places the birthday paradox has been explained. I always forget it. So this time, I decided to write it down for my own reference, keeping just the salient points in mind.

To start, a year has 365 days (forget leap years for now). The chances your birthday is on say Jan 28th is 1/365. Hence the probability of it not being on Jan 28th is (1 - 1/365 = 364/365). Let's add your friend now. The chances of both of you not having a birthday on Jan 28th is (364/365)^2 (exponential). So for 253 people the chances of all of them not having a birthday on Jan 28th is (364/365)^253. Makes sense? If not, maybe read a bit of probability from some source you like and come back. There's zero shame in this btw, I needed to do it for what it's worth :).

Anyway, so now you think why did I pick 253 above? Well let's do a little math here. If there's 2 people in a room, how many pairs can we form where order doesn't matter? Just 1 pair right? What about 3 people (a,b,c)? How many pairs? 3 pairs (ab, bc, ac). With 4 people (a, b, c, d) it is (ab, ac, ad, bc, bd, cd). Right? So let's generalize this now so we can calculate it for a larger number, instead of 2, 3 or 4. That's where combinations come in - scroll down the link (just above) to get the formula - (23!) / ( 2!) * (23 - 2)! [It's 2! because a pair has 2 people and you're forming a group of 2]. Doing the math on that it becomes:

23 * 22 * 21! / 2! * 21! = 23 * 22 /2 = 23 * 11 = 253. See that number before? :)

Tying stuff back in, it means that if I have 23 people (including me) in a room, there are 253 ways in which pairs can be formed. And remember, the chances of any of them NOT sharing a birthday are (364/365)^253. It's not (364/365)^23. It's the probability ^ no_of_possible_pairs. Again, if this is going over your head - step back and read a bit of probability theory and come back once you're comfortable.

So if the number of ways there CANNOT be pairs is (364/365)^253 = 0.4995 by the way, the number of ways there CAN CAN be a match somewhere - meaning someone in the room shares a birthday is 1 - 0.4995 = 50.05. Meaning, there is just about a 50% chance that someone in a room will share a birthday if there's at least 23 people in a room. Not share a birthday with you - just share a birthday with anyone in that room. Make sense?

Now all that's fine but how does that matter in real life, keeping security in mind? I'm thinking of a couple of examples:

- If I use a 64 bit key to create a MAC, I'm thinking that there's 2^64 possibilities which is correct. But that doesn't mean someone needs to try all of them before a match is found. Because of the birthday paradox, it means the real number is sqrt(2^64) which is a number in the order of 2^32 which is way lesser.

- Digital signatures are another area. If I use an algorithm that is susceptible to collisions to create a signature, it means that an attacker can find a collision for my signature more easily and spoof it. Meaning they could change the message, fake a signature that looks the same as the original one, attach it to the message and no one will detect it.

The fix to all this is to ensure that you use hashing algorithms who give you a larger number of possibilities even after the square root is taken. Meaning for a SHA1 hash, which has a 160 bit output - after 2^80 possibilities one will start to see collisions. It looks like SHA 256 is safe for now :)

Thursday, August 16, 2018

Unit Tests - Why?

A unit test is basically testing one unit of code. One unit literally means one snippet of code. One snippet could mean 5 lines, a small function or in some cases even an entire small program. Usually though, in large corporate environments your code base is pretty massive so a good starting place is to think of a minimum of 1 unit test per function.

So you go to the function's code and see that it has 100 lines of code. The first 10 are just variable initialization, then there's a few calls to 3rd party libraries to get data, normalize the data and then log it. For e.g. Requests to REST APIs, convert the returned data into Json, add a new key with a timestamp to it and then log success or failure.

Finally once these calls succeed OR fail, your code returns True or False. So when you write a unit test, you're thinking (non intuitively) but you're thinking - "Let's assume that all these calls succeed and come to MY code, where I make the True/False decision. I want to make sure that my code reacts properly in either case.".

Which basically means, if there was a way to make all those calls (requests, json, log, add timestamp key) NOT run at all, in the first place and just provide a FAKE normalized, Json blob with a timestamp in it TO my True/False code - I'd do it. Because think of it, you're NOT testing the functionality of any of those calls - you're just interested in your code. So let's see a fake example here:

def code:
   a = 1
   b = 2
   c = 'https://fakesite.com'

   response = requests.get(a,b,c)
   json = json.convert(response)
   json['time'] = time.time()
   logger.info("Request completed")

   if json.size > 2 and json has.key('time'):
      return yay()
      return oops()

def oops():
    return 0

def yay():
    return 1

All you want to test is if oops() and yay() get called correctly. Nothing else. The end. So your yay() test (never mind how right now) looks like:

def test:
   response = patch('requests.get', 'fakerequests.get')
               #fake returns {}
   json = patch('json.convert', 'fakejson.convert')
               #fake returns {'a':1,'b':2}
   time = patch('time.time', 'faketime.time')
               # fake returns 1500
   patch('logger.info', 'fakelogger.info')
               # Doesn't matter
   json['time'] = time
               # Add key

   check code() == 1
               #That's what yay() will return and pass your test.

Remember, you could run your test without patching a single thing. As in, let all the 3rd party calls happen and test with real data, but that'll slow things down badly, specially if you have 1000s of tests to run.

And again, if you're still struggling (coz I did for a long time) it's a "unit" test - you isolate the bit you care about, assume everything around it works well, give it the input it needs to work well and then write your tests.

I hope that demystifies it a bit :).