Wednesday, August 22, 2018

Serverless Development

Just another post to solidify concepts in my mind. The Serverless word is often used these days in conjunction with development. All it really means is that you do not have to spend time configuring any servers. No Apache, Tomcat, MySQL. No configuration of any sort. You can just spend your time writing code (Lambda functions). Mostly anyway :)

The most common use of this philosophy is in conjunction with AWS. As in, you create a configuration file called serverless.yml that follows CloudFormation syntax. This basically means you create a config file offline with references to all the AWS resources that you think you will need (you can always add later) and then upload that file to CloudFormation.

CF then looks through the entire file and creates all those resources, policies, users, records, functions, plugins and in short whatever you mentioned there. You can now launch a client and hit the deploy URL and can invoke all the methods you wrote in your Lambda function.

There are some clear instructions on how to deploy a Hello World as well as how one can write an entire Flask application with DynamoDb state locally and then push it all online to AWS with a simple sls deploy command.

All you need to make sure is that you have serverless installed, your AWS credentials configured and access to the console (easy to verify things) and things will go very smoothly. Of course there are going to be costs to all this - so make sure you do all that research before getting seduced by this awesome technology :)

Sunday, August 19, 2018

Birthday Paradox

There's a million places the birthday paradox has been explained. I always forget it. So this time, I decided to write it down for my own reference, keeping just the salient points in mind.

To start, a year has 365 days (forget leap years for now). The chances your birthday is on say Jan 28th is 1/365. Hence the probability of it not being on Jan 28th is (1 - 1/365 = 364/365). Let's add your friend now. The chances of both of you not having a birthday on Jan 28th is (364/365)^2 (exponential). So for 253 people the chances of all of them not having a birthday on Jan 28th is (364/365)^253. Makes sense? If not, maybe read a bit of probability from some source you like and come back. There's zero shame in this btw, I needed to do it for what it's worth :).

Anyway, so now you think why did I pick 253 above? Well let's do a little math here. If there's 2 people in a room, how many pairs can we form where order doesn't matter? Just 1 pair right? What about 3 people (a,b,c)? How many pairs? 3 pairs (ab, bc, ac). With 4 people (a, b, c, d) it is (ab, ac, ad, bc, bd, cd). Right? So let's generalize this now so we can calculate it for a larger number, instead of 2, 3 or 4. That's where combinations come in - scroll down the link (just above) to get the formula - (23!) / ( 2!) * (23 - 2)! [It's 2! because a pair has 2 people and you're forming a group of 2]. Doing the math on that it becomes:

23 * 22 * 21! / 2! * 21! = 23 * 22 /2 = 23 * 11 = 253. See that number before? :)

Tying stuff back in, it means that if I have 23 people (including me) in a room, there are 253 ways in which pairs can be formed. And remember, the chances of any of them NOT sharing a birthday are (364/365)^253. It's not (364/365)^23. It's the probability ^ no_of_possible_pairs. Again, if this is going over your head - step back and read a bit of probability theory and come back once you're comfortable.

So if the number of ways there CANNOT be pairs is (364/365)^253 = 0.4995 by the way, the number of ways there CAN CAN be a match somewhere - meaning someone in the room shares a birthday is 1 - 0.4995 = 50.05. Meaning, there is just about a 50% chance that someone in a room will share a birthday if there's at least 23 people in a room. Not share a birthday with you - just share a birthday with anyone in that room. Make sense?

Now all that's fine but how does that matter in real life, keeping security in mind? I'm thinking of a couple of examples:

- If I use a 64 bit key to create a MAC, I'm thinking that there's 2^64 possibilities which is correct. But that doesn't mean someone needs to try all of them before a match is found. Because of the birthday paradox, it means the real number is sqrt(2^64) which is a number in the order of 2^32 which is way lesser.

- Digital signatures are another area. If I use an algorithm that is susceptible to collisions to create a signature, it means that an attacker can find a collision for my signature more easily and spoof it. Meaning they could change the message, fake a signature that looks the same as the original one, attach it to the message and no one will detect it.

The fix to all this is to ensure that you use hashing algorithms who give you a larger number of possibilities even after the square root is taken. Meaning for a SHA1 hash, which has a 160 bit output - after 2^80 possibilities one will start to see collisions. It looks like SHA 256 is safe for now :)

Thursday, August 16, 2018

Unit Tests - Why?


A unit test is basically testing one unit of code. One unit literally means one snippet of code. One snippet could mean 5 lines, a small function or in some cases even an entire small program. Usually though, in large corporate environments your code base is pretty massive so a good starting place is to think of a minimum of 1 unit test per function.

So you go to the function's code and see that it has 100 lines of code. The first 10 are just variable initialization, then there's a few calls to 3rd party libraries to get data, normalize the data and then log it. For e.g. Requests to REST APIs, convert the returned data into Json, add a new key with a timestamp to it and then log success or failure.

Finally once these calls succeed OR fail, your code returns True or False. So when you write a unit test, you're thinking (non intuitively) but you're thinking - "Let's assume that all these calls succeed and come to MY code, where I make the True/False decision. I want to make sure that my code reacts properly in either case.".

Which basically means, if there was a way to make all those calls (requests, json, log, add timestamp key) NOT run at all, in the first place and just provide a FAKE normalized, Json blob with a timestamp in it TO my True/False code - I'd do it. Because think of it, you're NOT testing the functionality of any of those calls - you're just interested in your code. So let's see a fake example here:

def code:
   a = 1
   b = 2
   c = 'https://fakesite.com'

   response = requests.get(a,b,c)
   json = json.convert(response)
   json['time'] = time.time()
   logger.info("Request completed")

   if json.size > 2 and json has.key('time'):
      return yay()
   else:
      return oops()

def oops():
    return 0

def yay():
    return 1

All you want to test is if oops() and yay() get called correctly. Nothing else. The end. So your yay() test (never mind how right now) looks like:

def test:
   response = patch('requests.get', 'fakerequests.get')
               #fake returns {}
   json = patch('json.convert', 'fakejson.convert')
               #fake returns {'a':1,'b':2}
   time = patch('time.time', 'faketime.time')
               # fake returns 1500
   patch('logger.info', 'fakelogger.info')
               # Doesn't matter
   json['time'] = time
               # Add key


   check code() == 1
               #That's what yay() will return and pass your test.

Remember, you could run your test without patching a single thing. As in, let all the 3rd party calls happen and test with real data, but that'll slow things down badly, specially if you have 1000s of tests to run.

And again, if you're still struggling (coz I did for a long time) it's a "unit" test - you isolate the bit you care about, assume everything around it works well, give it the input it needs to work well and then write your tests.

I hope that demystifies it a bit :).