Sunday, August 19, 2018

Birthday Paradox

There's a million places the birthday paradox has been explained. I always forget it. So this time, I decided to write it down for my own reference, keeping just the salient points in mind.

To start, a year has 365 days (forget leap years for now). The chances your birthday is on say Jan 28th is 1/365. Hence the probability of it not being on Jan 28th is (1 - 1/365 = 364/365). Let's add your friend now. The chances of both of you not having a birthday on Jan 28th is (364/365)^2 (exponential). So for 253 people the chances of all of them not having a birthday on Jan 28th is (364/365)^253. Makes sense? If not, maybe read a bit of probability from some source you like and come back. There's zero shame in this btw, I needed to do it for what it's worth :).

Anyway, so now you think why did I pick 253 above? Well let's do a little math here. If there's 2 people in a room, how many pairs can we form where order doesn't matter? Just 1 pair right? What about 3 people (a,b,c)? How many pairs? 3 pairs (ab, bc, ac). With 4 people (a, b, c, d) it is (ab, ac, ad, bc, bd, cd). Right? So let's generalize this now so we can calculate it for a larger number, instead of 2, 3 or 4. That's where combinations come in - scroll down the link (just above) to get the formula - (23!) / ( 2!) * (23 - 2)! [It's 2! because a pair has 2 people and you're forming a group of 2]. Doing the math on that it becomes:

23 * 22 * 21! / 2! * 21! = 23 * 22 /2 = 23 * 11 = 253. See that number before? :)

Tying stuff back in, it means that if I have 23 people (including me) in a room, there are 253 ways in which pairs can be formed. And remember, the chances of any of them NOT sharing a birthday are (364/365)^253. It's not (364/365)^23. It's the probability ^ no_of_possible_pairs. Again, if this is going over your head - step back and read a bit of probability theory and come back once you're comfortable.

So if the number of ways there CANNOT be pairs is (364/365)^253 = 0.4995 by the way, the number of ways there CAN CAN be a match somewhere - meaning someone in the room shares a birthday is 1 - 0.4995 = 50.05. Meaning, there is just about a 50% chance that someone in a room will share a birthday if there's at least 23 people in a room. Not share a birthday with you - just share a birthday with anyone in that room. Make sense?

Now all that's fine but how does that matter in real life, keeping security in mind? I'm thinking of a couple of examples:

- If I use a 64 bit key to create a MAC, I'm thinking that there's 2^64 possibilities which is correct. But that doesn't mean someone needs to try all of them before a match is found. Because of the birthday paradox, it means the real number is sqrt(2^64) which is a number in the order of 2^32 which is way lesser.

- Digital signatures are another area. If I use an algorithm that is susceptible to collisions to create a signature, it means that an attacker can find a collision for my signature more easily and spoof it. Meaning they could change the message, fake a signature that looks the same as the original one, attach it to the message and no one will detect it.

The fix to all this is to ensure that you use hashing algorithms who give you a larger number of possibilities even after the square root is taken. Meaning for a SHA1 hash, which has a 160 bit output - after 2^80 possibilities one will start to see collisions. It looks like SHA 256 is safe for now :)

No comments: