Wednesday, August 18, 2021

Core-value anti-patterns

When I make decisions at work, I am mentally programmed and in many/most cases try to find the core value that this step mapped to - from a positive angle. That’s great and I think I should continue doing it. But there’s a flip side that I started thinking about…

Each time I identify the positive values I displayed when doing something, let me pause for 5 seconds and think - did I also impact some other value negatively? Maybe I didn’t. That’s awesome. But maybe I hesitated at times and instantly thought of a value I negatively impacted. And maybe every time, it's the same value that I found myself positively exhibiting or worse still, the same value I was negatively impacting. And if so, I might have been too guilty of being too strong an advocate of some values or too weak an advocate of other values.

But that was still too vague and I wanted to make it a bit more concrete and give myself and everyone else reading this some common anti-core-value-patterns to watch out for, for every single value. This list is not comprehensive by any means [please form your own questions]; but it should help you understand how every value can be incredibly toxic, if taken too far.

Tuesday, June 1, 2021

Exploring JWT refresh tokens

So JWTs are a very common stateless authentication token that are generated server side, and sent to a client. Every request that the client makes has to have a valid JWT, very similar to a session cookie in a 3 tier web application. The big difference though is that in a session cookie, the session is stored on the server and the session cookie is compared every single time, on every request.

JWT statelessness

With JWTs there is no server side state - and hence nothing to compare against. This apparently has huge performance benefits - which is probably true just because it's repeated so many times everywhere :). The access token is sent as part of every request. The refresh token is not; its only sent to an authorization server when you want a new access token. The access token is verified to see if its been tampered with (signature field) but not verified against any server side state.

Storage

If the client is a mobile app, both tokens are stored on the client device - I explicitly state this as its important when we consider the threat model of a token getting leaked. So if the device is stolen, the tokens are stolen. If there is a vulnerability in some 3rd party dependency you have no clue about, the tokens could be stolen. Or maybe your own code has a vulnerability and you accidentally leak these tokens or log them in an unsafe place.

Token leakage

Now JWT tokens once assigned cannot be expired whenever we choose. There is no way to do it. You can though set an expiry timestamp in the token, and it'll die after that. So the shorter the validity, the better it is. Coz if you set an access token to have a 24 hour validity, and it gets stolen - well guess what, the attacker can use that for 24 hours and there isn't a thing you can do about it. Same deal with refresh tokens, only worse, as the attacker can keep minting new access tokens with the stolen refresh token.

Except that you can revoke refresh tokens if you detect a leak if you want to. It does come at a cost though.

Statefulness - 1

But this then means that you have to maintain state for the refresh tokens on your server. The next time you see the leaked refresh token, you basically remove that entry server side so its no longer recognized as valid, mint a new refresh token and give that to the client. Remember here, that any access tokens already minted will not be expired. So refresh tokens help you in the future - they do not save you from the past.

Statefulness - 2

And so, you might think that sounds bad, and decide to keep a table to track every access token ever minted for a specific refresh token. That way when you see the access token got leaked, you can go and take it off the list and deny future requests. You can check every single request to be safe. Which would totally work, except that you've lost a lot (if not all) the benefits of having a stateless JWT, which is no server side state.

Middle ground

And so once you realize the above, a common middle ground is to keep a very short lived access token (5-15 minutes) and a relatively long lived refresh token (60 minutes - 24 hours) and gamble on everything else. By that I mean, that you hope neither gets stolen somehow. If the access token does get stolen, well that sucks but at least the attacker can only do 15 minutes worth of damage. Refresh tokens you hope are never stolen.

Middle ground - Security

Which all sounds great, but it doesn't defend you against if they get stolen.

  • You almost certainly have to invest in tooling that will detect leakage of these tokens.
  • You have to write secure code to store refresh tokens server side securely.
  • You have to write secure code to exchange refresh tokens for new access tokens and refresh tokens.
  • You have to store refresh tokens securely on your mobile client and make it as hard as possible for someone who steals the device to get to your token.

All of which is doable of course, but it takes time, money and commitment from a business to build the above systems in a secure manner.

My secure solution which is the best ever (in my mind anyway ;))

Assuming we continue with the stateless JWTs, I'm seeing very little benefit for the refresh tokens and a lot of downside. So I'd strongly recommend we ditch them completely.

  • We just take an access token (15 min expiry), check if its valid and give the client a new access token back.
  • Keep the old access token valid for another 5 minutes maybe, to defend against dropped requests. After 5 minutes only respect requests with the new token.
  • Save the state of the client server side so they can get back to wherever they were at any point, in case of flaky network connections.
  • Invest in the tooling to detect leaked tokens. DO THIS even if you ignore the entire blog and have a permanent token till the end of time on the client's device.
  • Of course the access token can get stolen too, but the damage is controlled, compared to the solution with the refresh token.
  • Try your best to authenticate the client, but make them reauthenticate after all attempts have failed.

What are the holes in the world's best solution? :). Is there something in refresh tokens that I have missed apart from the fact that its there to help the client not sign in again easily? Are they worth the complexity?

Monday, September 14, 2020

Red to blue team switch

I switched from being a penetration tester after nearly 7 years at Security Innovation to a security engineer at 98point6, an exciting healthcare startup. Given the fact, that all of my work in the security industry (nearly 15 years now) has been on the offensive side - I felt that this move would force me to think differently. As in, it would force me to evaluate whether all the recommendations I've given to clients over the years actually made sense - when one thought of them from a defensive perspective. Also, I wanted to learn what it was to work as a product security engineer on the inside of a product company - rather than from the outside, as I'd always done. So I made the leap about a year ago :). And predictably, I've learnt a few things, that I want to jot down here for anyone reading this and thinking about making a similar switch:

  1. Time = money. The more time you spend doing work that will earn you money, the more likely it is that you will eventually be a profit making company. So at a startup, its been very interesting for me to see that even a simple low risk vulnerability fix is almost instantly mapped to a question along the lines of - "Can we accept this risk?".

    If, after evaluating all existing mitigations it's found that the vulnerability is hard to exploit, it's not getting fixed. Well, not immediately anyway. Engineers will much rather spend time creating new features or scaling the environment or improving performance. Because that's going to earn $$, much quicker than fixing a largely theoretical Low-Medium risk vulnerability.

    So, in short prepare to do a lot of talking trying to convince people to see why something is a security risk that needs to be fixed at the earliest. Or soon...

  2. Pulled in multiple directions. If you're joining a company with a large, already existing security team - this is unlikely to happen and your role is likely to be well defined. If not though, and its a smaller team, multiple teams will all want you to chime in and provide security related guidance fairly quickly. All of this has to be done within a reasonable time interval (often 24-48 hours) so there is pressure to give quality answers in a short span of time. Which can be tough.

    So, if you are not good at saying no to overly optimistic timelines and learning to work in a sustainably intense manner - you will be burnt out very soon.

  3. Technology overload. In a sense, joining an internal team is good - as you can focus on a single product and learn it well. The problem though, is that you're not doing only software code review where you are reading the code line by line and have 2 months to master every flow - that has taken years to build. Nope. What will instead happen that you will juggle a lot of very different asks at the same time.

    So the mental context switching, as you switch between vendor analysis, nodeJS code review,  writing modular code to scan AWS infrastructure, provide advice on how to deal with phishing attacks, explaining a security concept to a junior team member, handling a huge ever growing JIRA backlog, creating an entire Wiki from scratch that is often out of date and trying to deal with an incident at 5pm on Friday evening, just as you wanted to log off can be very very exhausting. It's fun for a while as you think you're making giant strides across multiple muscles across the company - but it quickly gets very tiring, and you're not as efficient as you want to be.

  4. Feeling of having achieved nothing. This is just because there is so much to do, and as the product improves there is more stuff to secure. As the team size increases, there is more educating to be done and more code to be reviewed. So however hard you push, however organized and modular you are - there is always something that can be done better. And if you're one of those grumpy security people who see edge cases all the time, you'll always end up feeling that there are a million potential vulnerabilities that should be fixed - but cannot be (justifiably at times) fixed.

    So if I look back at what I've done in  a year, I'm sure that I've contributed to improving the overall security posture and made it somewhat easier for future security engineers - but there is always something to do, and I can constantly criticize my own work and feel I can do better.

  5. Learning something new all the time. Each organization does its development work differently and this has been no different. It has been quite challenging to learn exactly how dev pipelines are set up and which parts need to be explored more deeply - specially if you don't come from a development background. It's all incredibly interesting, and I'm happy to learn all these new skills - but you have to learn a lot, ask just the right amount of questions and keep things secure. I'm used to this from my red team work but it doesn't make it any less challenging just because there is so much to do - in red team work you learn about 1 thing for 2-3 weeks and throw it away until you need it again.

There's probably many more things that are very different in blue team work but these are some of the main points that I have found challenging and very different. There's more time for sure and I'm happy I made a switch as I could learn a new way of doing things and from multiple new teams - but it's not been a smooth ride.

As long as you embrace the uncertainty of a startup, where you are a major contributor towards setting up security engineering from scratch - a blue team can be fun. It is very different to working on a red team - which is much faster and certainly more glamorous. However, a blue team does give you the feeling that you are creating a much more solid foundation for a team and its product - something a red team does not always give you.

Friday, February 21, 2020

IAM Least Privilege Permissions

There are multiple parts to writing IAM least privilege policies. This document attempts to touch on the best ways to do each of these tasks. The assumption is that you’ve never written an IAM policy before, which is probably false but starting there is probably useful for a newbie developer.

Know what you need

Only you as a dev will know what data your code needs access to. For example: At a minimum you’ll need to know that your new feature plans to read patient records, identify the non-encrypted ones, encrypt them, delete the old records and store the new encrypted records, while maintaining an audit trail. No tool can tell you this, right?

Also, you’ll need to know which AWS service can do all those things. This too is not something any tool can automatically tell you. You need to go Google and read all AWS documentation to understand what tools AWS offers to do the stuff above. So then you get to a point where you say:
  • My records are in DynamoDB. I need read access to DynamoDB.
  • I need to look at the metadata of each record to see if it's encrypted. So I need read access to all metadata.
  • Then I need to encrypt them. I need a key to do this. I need to be able to use an existing KMS key OR be able to create a new one to encrypt my data.
  • I need to delete the old records. I need write access to DynamoDB.
  • I need to store the new records. I don’t need any more access, as I already got write access in the previous step.
  • Oh wait. Not all tables - just Table A :)
  • I need to log all my actions somewhere. I need write access to Cloudtrail.

How do I implement the above?

  • Now you know you’re going to probably have a StepFunction or a Lambda function where that code will live. Which means there’s an execution role that comes into play. It’s a good idea to use a new role for each function coz:
    • It’s cleaner and you know exactly what that function and only that function needs
    • It’s easier to change and not worry what else will break in some other function that shares that role
    • Security will whine less as you’ll probably follow least privilege when you assign permissions
  • Okay good. You create a new execution role and now need to assign permissions. You need to map all those Read/Write statements that made sense intuitively into an IAM policy, which is what controls access to all the roles.
  • You add a lot of permissions, see that it works, dial down a bit, see it still works, dial down some more and see it break and repeat this increasingly painful and irritating exercise till everything works. Of course then security complains that it's not least privilege, gives you a link to what’s expected and rejects the request. The link’s somewhat helpful but not a lot, and you end up repeating all that crap again till they’re happy.
  • You get better at this over time but at one point feel you’re wasting a lot of time tweaking policies, when you should be writing code instead. And then wonder if there is a better way to write IAM policies.

Some ideas to get better

  • Definitely use a PolicyGenerator. Don’t write a policy by hand.
    • AWS Console: You need to know exactly what you want. If you do, it works great
    • AWS PolicyGen: Older version of Console but with a simpler interface. Not sure its maintained
    • PolicySentry: This looks like a fantastic tool that does a lot of the policy generation for you. It is worth spending the time learning all its detailed options.
      • If you know which resource you want to control access to, use a CRUD template
      • If you know which actions you need, use an Actions template
  • To clean up policies generated by PolicySentry, remember what you really need and refer to the AWS docs. You’ll need to do this as you almost certainly do not need everything PolicySentry gives you. Pay heed to the Dependent Actions column as well and don’t forget to grant access to those actions. Here is a sample table for CloudTrail.
  • Create as many templates as possible for services that you often use and make them easily reusable or to reference, across your engineering team.
  • Use a linter such as Parliament to check your policy once it’s generated for obvious typos as well as some security misconfigurations. Integrate it into your pipeline.
  • Use the AWS policy simulator to verify if your execution role has and doesn’t have access to the expected resources. Think of this as a confirmation of your existing policies. The good thing with this is it doesn’t make any changes to your actual stack.
Examples for tools above:
Policy Sentry templates

CRUD:
policy_sentry create-template --name TestRole --output-file crud.yml --template-type crud


Edit the YML template that gets generated, tweaking it to remove whatever you don't want.










































Policy Sentry query commands


policy_sentry query action-table --service cloudwatch --wildcard-only
policy_sentry query arn-table --service dynamodb
policy_sentry query condition-table --service lambda

Parliament sample bad policies

Blank resource field where a "*" or an ARN is needed:


















Mismatched condition where the condition is not valid for the chosen action:



Monday, November 25, 2019

Anatomy of a check scam


I was walking home on a Saturday afternoon across a shopping center which has a Bank of America branch (where I have an account). All of a sudden, a guy walked up to me and told me an interesting story. He said he wasn’t a bum or anything, he had a lot of money. He then basically said his sister had been in a car crash and the insurance company had sent her a cheque which needed to be cashed. He couldn’t do it as he didn’t have a BOA account, so could I cash the cheque out and give him the money instead and I’d get 40$ that was remaining. Now I didn’t really care about the 40$ but went “What’s the risk here? Maybe its legit?” So I say okay sure.

Then he asks me for my name (not email or SSN or anything) which sort of got me thinking about this but I still go ahead. Then he texts someone and says “Yeah she’ll be right over”. Then she doesn’t come over but her “boyfriend” does come over. There’s a cheque with the “boyfriend” who thanks me a lot and in the “Pay To” section there’s my name with an amount of 4,963$ or whatever written. I’m still so fixated by the idea that someone needs help that I’m not thinking “Which insurance company issues a blank cheque?” And why?

Anyway, we go in to the bank and there’s a long line. Something intuitively is not feeling right at this point, so I look at the cheque and start studying its contents. It says “deposit refund” in the memo. Hmm? I ask why and he says “yeah its refunding my deposit”. So I think okay? Maybe its a downpayment that is being refunded? I don’t know. I continue standing in the line.

The line’s really big though and my spider sense is still tingling. I don’t know why. Maybe its because I’ve worked in security a lot, maybe it’s my brain intuitively guessing something’s off but I take out my phone and simply take a picture of the cheque without asking the dude. A few minutes later the guy is like “I think I’ll just ask him to do a direct deposit”. I’m like “You sure?” He says yes and we walk away. No harm done. As I walk away I see the first guy (from the back) taking to someone. Hmm I wonder why..

Anyway I go home and something’s still gnawing at me. I tell my wife all this and she’s interested too. She immediately thinks though it’s a fraud. Don’t know how. The address on the cheque is legit so it lowers the probability of a forgery although that can’t be ruled out either. The person whose address it is doesn’t pick up so I leave a voicemail and call BOA. BOA’s fraud department is immediately helpful and after listening to the story goes “By the time you finished, I was thinking - Please tell me you didn’t give them your money”. And I’m like “No, but how does this work?”.

In short the check is forged (somehow) and the bank cashes it. The scammers take the cash and run. Later the check turns out to be a forgery and the bank claims the money from the person they paid it out to. And there’s nothing they can do about this either. I’m on the hook coz I’m the one they paid out to. The person whose name was on the cheque returns my call Sunday afternoon and I tell them all about this. Turns out their office was broken into, their card and cheques stolen. So the cheque was not forged, they’d have been liable and I don’t know if I’d have then been liable as well and put into jail or whatever.

All’s well that ends well and I’ll call the cops and make a report too but I got really really lucky this time. The thing that bothers me the most is that I’m someone who works in security all the time and I really should not be so trusting of someone just because that communication is not digital. If the same thing had been digital, I’d have caught all the red flags inside 5 minutes or less. But just because it’s a person.. face to face… I let my guard down. I failed this time.

To conclude - the only advice I can give anyone is to remain calm and keep thinking when weird, uncertain shit like this happens. If you do, you have a good chance of being safe. Oh and a funny note, I also got to hear my wife say “What would you ever do without me around? You’re so gullible :)”. Happy holidays everyone.

Saturday, June 1, 2019

AWS - Security, Identity and Compliance

This blog defines a number of services that are relevant to AWS security. It is recommended that you know all these services as well as possible.


IAM: This is the heart of all the authentication and authorization that AWS services perform. If there's one service you should learn in and out this is it. Admins can create IAM users and roles, associate access keys (for programmatic access) and assign permissions to each user and role. Developers can use the access keys to programmatically invoke all AWS services, subject to the permissions assigned to the user/role. Additionally almost all (if not all) services create service-linked roles and assume IAM roles to perform operations in another service. Here is one such example. It is possible to use IAM to control access by a user to an entire service, specific APIs in a service or in many cases to specific resources as well.

Resource Access Manager: This is a service that allows one account to share resources with another account. The person who uses the shared services can perform actions similar to the owner of the resources. This helps reduce operational costs and also the overall attack surface (due to there being lesser things to manage). However there are only a few resources that can be shared as of now. Here is a walkthrough of this service by the ever helpful Jeff Barr.

Cognito: Cognito handles authentication for web and mobile applications. This is Amazon's user directory against which users can authenticate against a user-pool and obtain a user-pool token. Users can authenticate directly against user records stored in Cognito or use a SSO provider such as Google or Facebook to authenticate. The user-pool token is then integrated with an identity pool to obtain temporary AWS credentials using the STS service (which does not have a web console :)), transparent to the user. These credentials are then used to access AWS resources.

Secrets Manager: Like the name suggests, this stores credentials in a secure manner using KMS. Instead of hard-coding credentials in source code or configuration files, they can be stored in a vault such as Secrets Manager. Applications can retrieve these credentials at run-time to implement their functionality. Passwords, API keys or anything else that is considered a secret can be stored here. Automatic rotation of these credentials is also possible for RDS (MySql, PostGres and Aurora) database passwords.

Guard Duty: This is a security monitoring tool that continuously studies different logs (Cloudtrail, VPC etc) and generates security findings. Rules in Guard duty are part AWS, part from AWS's security partners and users can themselves customize Guard Duty rules to help detect threats.

Inspector: This involves installing an agent on an EC2 instance that then scans for open ports, verifies if an instance is vulnerable to known CVEs or verifies the system against CIS benchmarks. In short it is Amazon's vulnerability scanner (for a few items) aimed at helping EC2 instance owners secure their instances better. If you're managing your instances yourself, this seems like a useful service to have, if you're willing to pay the extra money :). Note that charges are per instance so if you only have a few servers, this could be pretty cheap.

Macie: This is a fancy (fairly pricey) tool that AWS has to detect data leakage of specific information from S3 buckets (upto 3 Tb in size). It classifies data based on numerous very specific rules (for e.g 1 and 2). It's also integrated with KMS which means there is a way to scan bucket content that is also encrypted.

Single-Sign-On: This allows AWS to function as a SSO solution while being tightly integrated with a number of AWS services. It integrates with AWS Directory so you can store all your user information there and authenticate against it. Additionally, if you authenticate successfully once it will allow you access to all of the services across all the AWS accounts, that are integrated with SSO. There's also a way to migrate your entire Active Directory to AWS so your users can continue using the same passwords. It's very similar to IAM, in a way - except that IAM is just for the single account. Here is a good article about how AWS SSO works.

Directory Service:  This is AWS's version of Active Directory. You can use SimpleAD which provides some features allowing easier management of EC2 instances. A more powerful version is the AWS Managed AD solution which allows you to access AWS apps, manage instances, use Azure Cloud apps, authenticate to an on-premise Active Directory over a VPN connection or share an AD domain hosted in another AWS account. You could also use an AD connector to allow EC2 instances to join an on-premise Active Directory. Users can then access the applications running on EC2 while authenticating against the on-premise Active Directory.

Certificate Manager:  This is AWS's certificate authority solution that helps users of applications use certificates to secure communication to them over TLS. You can create certs inside ACM or import certificates from outside. ACM is integrated with a few other common services (not all). The certificate's private key is stored securely and encrypted using KMS.

Key Management Service: This is the AWS key vault that securely stores data keys that are used to encrypt data. You can choose to let AWS create an AWS master key or create a customer managed key yourself. This key never leaves KMS. The master key encrypts the data key, which is the key that you actually use to encrypt/decrypt data outside KMS. You can choose to create the data key outside and import it to KMS, where the master key encrypts it. This is envelope encryption, which offers better security compared to single-key encryption. Almost every piece of data needs encryption these days and very predictably - a lot of them are integrated with KMS.

Cloud HSM: A HSM is a server that contains specialized hardware optimized to perform cryptographic operations. It helps with operations such as these. HSM's are costly - be sure you need them. In CloudHSM you create a cluster, and then add HSMs to the cluster to help with data redundancy. KMS additionally integrates with CloudHSM to help store keys even more securely.

WAF and Shield:  WAF is a web app firewall that monitors requests and allows/blocks traffic to the web-server that hosts content. You can choose which requests are acted upon. Shield helps protect applications against DDOS attacks. It has a Standard and Advanced mode (as the name suggests offers more protection). If you know what you're doing and don't have any fancy requirements, Shield Standard should be good enough.

SecurityHub: This is a one-stop to view the results of security scans done by Guard Duty, Inspector and Macie. Additionally, scan results from other partners are also listed here. It also claims to help businesses be compliant against CIS benchmarks.

Artifact: This is where you can go to look at all your agreements with AWS and manage them. Additionally, you can download numerous reports published by 3rd parties, verifying Amazon's compliance with numerous regulations.

Friday, May 24, 2019

AWS - Networking Services

VPC: This is the DMZ/Vlan/Segmentation equivalent for the cloud. You can create a VPC, create subnets inside the VPC and then assign EC2 or RDS instances (or anything that needs an IP address) addresses inside individual subnets. You can then set ACLs on the VPC or individual subnets (in addition to security groups on the instances itself) to control inbound and outbound communication. You can have private and public (internet facing assets) subnets in a VPC. You can have these things called private VPC endpoints for public services such as KMS (cryptography), that ensure that traffic to KMS, instead of being sent over the Internet is sent exclusively over the AWS network. This is one of those services that you will almost certainly use, if you are on the cloud so do be familiar with it. :)

CloudFront: It is usually a common practice to use a CDN to cache static content in locations closest to the user (edge of the network) so round trips to the web server and DB server can be avoided. Now though, even dynamic content is retrieved by edge locations close to the destination servers and served to the end user. AWS Cloudfront claims to take a look at the requests coming in and making decisions on what dynamic content to serve to whom.

Cloudfront is also integrated with Web App Firewalls and DDOS protection services to protect services against malicious attacks. It additionally integrates with Lambda (run functions based on specific events), handles cookies (possibly for authenticated requests) and ACM so that a specific certificate is shown to the end user. Here is a good article about how CDNs work, along with a nice diagram at the bottom.

Route53:This is AWS's DNS service. It allows users to register their domains, configure DNS routes so that users can reach their application as well as check the health of web servers that are registered with Route 53.

API Gateway: This allows users to create HTTP REST & WebSocket APIs for any functionality they want to implement. You can integrate the API with HTTP (Query string parameters), call a Lambda function when an API is called, integrate it with other AWS services and then return a response to the end user.

Direct Connect:  This establishes a physical, link between the end user network and an Amazon location that supports Direct Connect. For this purpose fiber-optic cables that support either 1 Gbps or 10 Gbps must be used and the customer network devices must meet certain requirements. The main purpose of this service is to speed up data transfer between an on-premise network and an AWS service by bypassing a lot of the public Internet. This can be public like S3 or privately hosted inside a customer VPC. The other key factor is that this is apparently much cheaper than accessing S3 or VPCs over the Internet. Here's one such implementation.

App Mesh: Microservice architectures are quite common these days. The greater the number of microservices though, the greater is the management overhead from a monitoring perspective. Once there are applications already running somewhere (EC2 for example), App Mesh, built on Envoy can be configured such that traffic to every single micro-service of the application first passes through App Mesh. Rules configured on AppMesh can then determine the next steps to be taken. This is better than installing software on the OS of every microservice host and have them communicate to diagnose problems.

Cloud Map: This allows you to create user-friendly names for all your application resources and store this map. This can all be automated so as soon as a new container is created or a new instance is spawned due to more traffic, its IP address can be registered in CloudMap. When some micro-service needs to talk to another service, it'll look it up in CloudMap. This hence means that you no longer need to maintain a configuration file with locations of your assets - you can just look them up in CloudMap.

Global Accelerator:  Global accelerators once configured provide the user with a static IP address, mapped to several other servers. The traffic that hits the global accelerators will be redirected over routes in the AWS network to hosts close to the user's location and that have less load, so that the overall availability and performance of the applications improves. The aim is that traffic doesn't hit nodes that are not performing that well.

Thursday, May 23, 2019

AWS - Migration Services

Application Discovery Service: This one's to find out what offline servers you have and make a list of all that to then display them in the console online. For VMware VCenter hosts there's an AWS VM you have to install that'll do the discovery. Alternatively you can install an agent on every offline host you want tracked online. The last way is to fill out a template with a lot of data and import it into the console.

Database Migration Service: This is pretty self explanatory in that it allows you to migrate from an AWS data store to another AWS data store (support for Aurora, MySQL and plenty others) or to/from an on-premise instance. You can't do on-premise to on-premise :). The source database can apparently remain live throughout the migration which AWS claims (and probably is - idk) is a great advantage.

Server Migration Service: Just like the previous service helps migrate on-premise databases, this one helps migrate on-premise servers in VMWare, Hyper V and interestingly Azure to AWS. A VM is downloaded and deployed in VMware Vsphere. This then (when you say so) starts collecting the servers that you've deployed in VSphere and deploys it as Amazon Machine Images (AMI) to the cloud. These images can then be tested by creating new EC2 instances using these AMIs to see if they're functional before deploying them to production.

AWS Transfer for Sftp: This is quite simply just a managed Sftp server service that AWS has. The aim is to tempt people away from managing their own SFtp servers offline and migrate data to the cloud. It supports password and public key auth, and stores data in S3 buckets. All normal SSH/SFTP clients should work out of the box. Authentication can be managed either via IAM or via your own custom authentication mechanisms.

AWS Snowball: This is an appliance that you can ship to your data-center, copy all the data (upto 80 (Snowball) -100 (Snowball Edge) TB) to it over your local network and then ship the box back to AWS. AWS take that box and then import all the data into S3. The key win here is that you don't need to buy lots of hardware to do the transfer but can use AWS's own appliance instead. Also it saves a ton of bandwidth because you're doing local transfers instead of over the internet.

Datasync: Contrary to Snowball, Datasync transfers data to/from customer NFS servers to/from S3 or EFS over the network at high speeds using a custom AWS Datasync protocol (claim is upto 10 Gbps). Alternatively they can choose to go from NFS in the cloud to S3 also in the cloud. A DataSync agent is installed as a VSphere OVA in case of an on-premise server after which you add the various locations and configure them as sources or destinations. Finally a task starts and data is transferred between the 2 locations. Here's a nice blog demonstrating this.

AWS Migration Hub: This is sort of a 1 stop for starting off collection or data migration using the various other services that AWS has. Some of these were already mentioned above (Server and Database migration services). In addition there are some integrated migration tools (ATADATA ATAmotion, CloudEndure Live Migration etc - none of which I've heard of :)) that one can use when performing this migration. There is no additional cost to use this service - you pay for using the individual tools themselves.

Tuesday, May 21, 2019

AWS - Database Services

RDS: AWS's relational database system which is basically hosting MySQL, PostGres, MSSQL, Oracle, Amazon's own AuroraDB and MariaDB, an open-source clone of MySQL. Applications that are on application servers at data centers or hosted in the cloud can both use RDS as a data source and customize the DB instance (the basic unit of RDS) with the hardware and memory that they want. The databases can all be administered using the respective clients. AWS Networking and Backups are integrated with RDS.

DynamoDB: AWS's NoSQL database which stores data in JSON key-value ("a" : "test") format. Instead of writing SQL queries like with a relational database, you write NoSQL queries that query JSON. It integrates with AutoScaling that changes the read and write capacity of the database, depending on request volume. It also integrates with KMS allowing you to encrypt data at rest on the fly. It claims to scale really well horizontally (throw more computers at the problem). DynamoDB also has a HTTP API that you can use to directly query it. As usual, the devil is in the details and it is probably not for everyone. There's a nice blog which has a cool flowchart about when one should and should not use DynamoDB.

Elasticache: This is an in-memory database that supports Redis and Memcached. The point of an in-memory DB is to increase the speed of resolution, so users do not have to wait longer to use services. In other words it is a layer of abstraction before the database. If a user's request can be served from Redis cache, it will be done - and done faster than a round trip to the database. Here is a link to a comparison between Redis and Memcached.

Neptune: This is a graphing database. It is largely useful when there are large sets of data that are related to each other. The inter-related data is stored in the database and users can query it using languages built specifically for graphing (Apache Tinkerpop Gremlin and Sparql). Its interesting that the smallest DB instance that you can provision from inside Neptune is db.r4.large (~16 GB RAM) - which by itself shows that this is a product used for very large data sets.

Redshift: This is AWS's enterprise data warehousing solution. In other words it means that it helps analyze petabytes (if you want) of data from a variety of sources such as S3, Glacier, Aurora and RDS. There's a lot of database design that's needed, so I'm guessing (do not know for sure) that things can get pretty complex, pretty soon. Once the data is inside a RedShift cluster (for example: copied from S3), you can run SQL queries against it and make complex queries against the cluster. If you don't have huge amounts of data you probably do not want RedShift.

DocumentDB: This is basically there so you can migrate all your MongoDB content to the cloud while continuing to use all the Mongo relevant clients and tools. All you then do is change the DB endpoint to point to the DocumentDB endpoint in the cloud. The cool bit here is you can autoscale the storage your DB needs and the read capacity (how many queries can you make) so large applications are easily served. This too has the smallest instance as a db.r5.large instance with 16 GB RAM. So that feels like this too is a production service and might be expensive for smaller loads. I don't know that for a fact though - so please do your testing :)

AWS - Storage Services

S3: This is arguably (along with EC2) the most popular service that AWS offers. In short it allows users to store their files in it - behaving like an online file store. It has other uses such as hosting a website that has static content in it. Services very commonly store audit logs here and in short S3 is integrated with a large number of AWS services. S3 is a global service and has buckets whose names are unique - 2 users cannot create the same bucket. Files are stored inside the bucket and are called keys. For such a popular service - it does have fewer options (which are sufficient) via the AWS CLI. If you're starting to learn about AWS, this is the place to start.

EFS: This is an NFS file system that expands to the sizes of the files you are storing on it. You can use an NFS client on an EC2 Linux system to remotely mount and then read/write from/to the file system. They also have this interesting concept called lifecycle management which moves infrequently used files to a different class of EFS storage that costs less.

The GCP equivalent for this is FileStore.

FsX: This too in short is a file system that can be accessed remotely but it has been made keeping Windows systems in mind. Users who have Windows applications that need access to a lot of data over the network via SMB mapped network drives, are the targets. Linux systems too can access these mapped drives using a package called cifs-utils. It additionally also supports applications that use Lustre, a filesystem that targets applications that require a lot of computation.

S3 Glacier: If you have a large number of files that you do not want to delete (like old pictures) but do not use often S3 Glacier is the thing to use. The unit of storage for Glacier is a vault which is sort of equivalent to a bucket in S3. Only creation and deletion of vaults is through the console; everything else happens via the CLI or SDK. Additionally it also claims to be extremely low cost, which I'm not saying anything about :)


Storage Gateway: If there is an existing data-center where you already have a large number of applications that talk to databases, scaling this can become hard quickly if you have a lot of traffic. The AWS Storage Gateway is a virtual machine appliance (ESXi), an on-premise 1U hardware appliance (buy on Amazon) or even as an EC2 appliance. Once it's activated, the appliance will pick up all your data from the datacenter stores and put it on to S3, Glacier or EBS. Now you can just point your application to the new stores via an NFS client and it should work seamlessly. Here is a blog that walks you through a sample process. Additionally it allows backup applications to directly hit the gateway (configurable as a tape gateway) and backup directly to AWS S3 or Glacier.


AWS Backup: This service allows you to backup data from EC2, RDS and a few other services to S3 and then move that data to Glacier (I think) after a certain time. You can configure backup plans to decide what gets backed up (by tagging resources), when, whether its encrypted or not and when the backup is deleted. As of now it only supports a few services, but it's reasonable to assume that once it becomes more popular there'll be more services that are added to this.

Thursday, May 16, 2019

AWS - Compute - Container Services

Here is an image from the Docker website that describes how containers work.



Teams are increasingly building their workflows around Docker containers. Amazon has a few services that make this easier. This post briefly discusses each of these services.

ECR: This is a repository of pre-built images that you can build on your machine and then upload to AWS. So for example: You can build a Ubuntu image with a LAMP stack and any other custom packages and upload it to ECR. When other AWS services need to use that image for some other purpose, it is easily available.

ECS: Once the Docker images you built earlier are uploaded to ECR, one can use these images on EC2 instances to perform whatever computing tasks were specific to that container. This is where ECS comes in. Users can direct ECS to run specific containers that it then picks up, identifies EC2 instances they can be run on (creates a cluster of these) and then does so.

Once the cluster is ready, a task definition needs to be created. This defines how the containers are run (what port, which image, how much memory, how many CPUs and so on). When the task definition is actually used, a task is created and run on the cluster of EC2 images (each is called an ECS container instance) that were originally created.

An ECS agent is additionally installed on each ECS container instance that communicates with the AWS ECS service daemons itself; these agents respond to start/stop requests made by ECS.

The equivalent product on GCP is Kubernetes.

EKS: Kubernetes on Google has an architecture where there is a Kubernetes master node (the controller) and a number of worker nodes (equivalent to ECS agents on Docker containers) that send information about the state of each job to the master. The master then (similar to ECS) sends information about its various tasks that are running, to the Kubernetes daemon itself which uses it for some controlling purposes. Here is a diagram that illustrates this:



EKS on Amazon allows the Kubernetes master to be configured inside the AWS environment and allow it to communicate with deployments elsewhere, while simultaenously interacting with ELB, IAM and other AWS services.

Batch: If one has a job that one wants to schedule and run periodically while automatically scaling up or down resources as and when a job completes or takes up more memory/resources - AWS Batch is a good idea. AWS Batch internally uses ECS and hence Docker containers on EC2/Spot instances to run the jobs. Here is a nice guide that goes into an example of using Batch in a bit more detail.

Tuesday, May 14, 2019

AWS - Compute Services

This blog summarizes some of the AWS Compute services. I deliberately do not cover the ones that deal with containers, as I plan to blog separately about those. I'm looking at Google Cloud side by side from now on so I'll keep updating these posts just to mention if there is an equivalent. When I get to Azure, I'll do the same there as well :)

EC2: EC2 is one of the most popular services that AWS has. It basically allows you to spin up virtual machines with a variety of operating systems (Linux, Windows and possibly others) and gives you a root account on it. You can then SSH into it using key authentication and manage the system. What you want to use it for is completely up to you: Host a website, crack passwords as a pen-tester, test some software or really anything else.

The GCP equivalent for EC2 is Compute Engine.

Lightsail: Lightsail is very similar to EC2 except it comes with pre-installed software such as Wordpress or a LAMP stack as well and you have to pay a little money to own the server. The plus here is that it's easier for users who are non-technical to use Lightsail, compared to EC2 where you have to do everything yourself. In other words it is Amazon's VPS solution.

Lambda: This is AWS's Function-as-a-Service solution. In other words you write code and upload it to Lambda. You don't necessarily have to worry about where you'll host your code and how you'll handle incoming requests. You can configure triggers in each of these other services and then have Lambda act when the trigger is activated. For example: You can create a bunch of REST APIs and have the back-end requests handled by a Lambda function, upload files to S3 and have something happen each time a specific file is uploaded or do more detailed log analysis each time an event is logged to Cloudwatch. Lambda is integrated with a large number of AWS services so it is well worth learning it and using it better.

The GCP equivalent for Lambda is Functions.

Elastic Beanstalk: If you have some code that you've built locally and want to quickly deploy it without worrying about the underlying infrastructure you'd use to do it and don't want to spend a lot of time tweaking it - Beanstalk is the way to go. You can for example choose Python as a runtime environment, upload your Python code and let AWS then take over. AWS will create roles, security groups and EC2 instances that are needed (among anything else) and deploy your application so it is then easily accessible. If you need additional components such as databases or want to modify the existing configuration, these can be added later to the environment.

The GCP equivalent for Elastic Beanstalk is App Engine.

Serverless Apps Repository: This is a large repository of applications that have been created by users and uploaded for use by the community. One can grab these applications and deploy it in one's own AWS account. The requisite resources are then created by deploying a SAM template. The applications can be used as is or modified/code-reviewed before actually using it. If you change your mind, you can delete the CloudFormation template - this will delete all the AWS resources that were created during deployment.

Tuesday, November 13, 2018

Content Security Policy - Quick Reference

This is a post to help me remember the various parts of CSP. The w3 specification for CSP is very readable - this is NOT a replacement for them - just something to help me remember the directives :)

Here's a nice link where you can generate your policy bit by bit.

Remember, by default content is allowed to run on the web - not blocked. If browsers made the defaults as 'block all', I'm willing to bet a lot of issues would go away.

Don't use:

- unsafe-inline: Allows inline JS (includes javascript:) to be run, this is where a ton of XSS happens
- unsafe-eval: Runs eval() on any JavaScript user input that is passed to it
- data: The 'data' tags allow content to be encoded as text/html or base64 and are another way of delivering inline content


Fetch-directives:

- child-src: Controls where <frame> and <iframe> can be loaded from
- connect-src: Controls where you can make direct connections to web-servers to (fetch(), WebSockets, XHR, EventSource)
- default-src: If the site uses JS and you haven't whitelisted any sites, it'll look at what's here and try loading a script from here. This is the default for every other fetch directive. Starting with 'default-src: None' is a good idea to start white-listing content
- font-src: Where can I load fonts from?
- frame-src: Where can I load Iframes from?
- img-src: Where can I load images from?
- manifest-src: Where can I load app manifests (metadata about a specific application) from?
- media-src: Where can I load audio, video and subtitles from?
- prefetch-src: Where can resources be prefetched from? This just means that some resources on the page will be 'processed' (DNS resolution for example) before they are actually requested
- object-src: Where do plugins (embed, object, applet) get loaded from
- script-src:
    * A list of white-listed sources for Javascript.
    * 'self' indicates that the browser should load scripts only from the site itself and nowhere else.
    * This controls inline scripts as well as XSLT stylesheets that can trigger script execution.
    * Adding 'nonce = really_random_nonce' or 'sha256-hash' can allow very specific inline scripts if there's no way to whitelist inline scripts
    * strict-dynamic accompanied by a nonce for a script, means that any scripts recursively called by that script are automatically trusted, without needing a nonce or hash themselves
- style-src: A list of whitelisted sources for CSS
- script-src-elem, script-src-attr, style-src-elem, style-src-attr all similar to script-src and style-src, except that they allow blacklisting specific tags instead. Not yet in browsers though, but here's a Google Group post.
- worker-src: Where can I load background Web Workers from?

Document directives:

- base-uri: Controls where relative URLs can be loaded from
- plugin-types: Restricts the types of plugins that can be loaded into the document
- sandbox: Controls what the IFrame that's embedded in your page can do. You can allow scripts, popups or forms for example

Navigation Directives:

- form-action: Submit forms only to specific whitelisted URLs. Useful when an attacker can actually inject their own form tags
- frame-ancestors: Defends against clickjacking attacks by limiting the websites that can actually frame the target site using frame, iframe, object, embed or applet tags
- navigation-to: Limit the websites that a page can navigate to

Reporting directives:

- report-to: If CSP is started in report-only mode, where do you send the report violations

Other important directives:

- upgrade-insecure-requests: Upgrade all requests made over HTTP to use HTTPS

- block-all-mixed-content: Ensure that all resources are requested over HTTPS, as long as the page is loaded over HTTPS
- require-sri-for: Subresource integrity for all scripts requested from a third-party-domain to detect tampering on the way

Other directives:

- referrer: Sends referrer only under certain conditions
- reflected-xss: Controls features in user-agent to prevent xss


Thursday, October 4, 2018

SSH certificate authentication

tl;dr:

* You can configure client-side and server-side authentication using SSH certificates with the existing openssh daemon.
* You never need to worry about MITM attacks on the client when connecting to the server the first time
* Significant decrease in management overhead of SSH keys on the server

If you have a remote server to manage and it's running Linux (or even Windows for that matter but that's beside the point) - it's very likely that there is an SSH daemon running on it. You use an SSH client to connect to it and perform administrative tasks on it. While doing so, you can use passwords (by default) or public key authentication which is a bit more secure as it takes out the password-brute-force attacks. It does mean though that there is some management overhead on both the client and the server side.

On the client, you have to add the host that you are connecting to your known_hosts file. So over time, you have a massive list of known_hosts with no clue about the purpose of each host. Similarly on every server, there is a huge authorized_keys file which has the client's public keys added to it. When you want to revoke a client key you have to go in and remove that client's key from this file on every server. When you want to not trust a server any more, you need to remove that entry from your known_hosts manually. This is something that can go wrong easily if you miss one server - so there's probably some automation that is probably required here that can make it more reliable.

Certificate-based-authentication goes one step further, where a client trusts any SSH server signed using an 'SSH-root-CA' and a server can in turn trust a client key only if it is signed by a 'user-CA'. There is a really nice post by Facebook where they automate this process and make it even less error-prone. Those posts do a good job of walking you through step-by-step but I did have trouble replicating it, so I'll do a quick summary of the exact steps here.

Server certificate authentication

1. Configure an SSH daemon on a server (Docker, EC2, VirtualBox doesn't matter - but ideally a separate host as it's the CA). Let's call it ca.
2. Generate an SSH keypair for the server CA.
3. Start a new server up. Let's call it host1. This too should run SSH. This is the server you want to login to and administer.
4. Generate an SSH keypair for host1 in /etc/ssh
5. Copy host1's public key onto the ca server. Sign host1's public key with ca's private key. This will create an SSH certificate.
6. Copy ca.pub and the certificate you just created from ca to /etc/ssh on host1.
7. Configure /etc/ssh/sshd_config to use the key you created in Step 4 as well as the certificate. This is done using the HostKey and HostCertificate directives.
8. Restart the SSH daemon or reboot your server to reload your SSH config so it uses the certificate
9. Configure the client machine (any machine apart from host1 and ca) to recognize the ca's public key using the @cert-authority directive. This is so you don't get a 'Should I connect? Yes/No' message the first time you connect to host1.

User certificate authentication

1. Generate an SSH keypair for the client. This is the userca.
2. Generate a second SSH keypair for the client. This is the key you use to connect to host1. Call it client.
3. Sign client with userca. This will generate a cert as well on the client.
4. Copy userca.pub to host1 and configure sshd_config using the TrustedUserCAKeys directive pointing to userca. This is so host1 recognizes that all user certs signed by this cert are to be accepted.

At this point, you should be able to login to host1 from client and never get a popup the first time I connect because I've explicitly trusted the server CA. It's also very cool that there is no need to do any more key management on any server, as long as you trust the user CA used to sign the user keys.

References:

Dockerizing an SSH service
Hardening SSH

Tuesday, September 18, 2018

AWS - Developer Tools

This post summarizes the AWS services that are used to help you write code and reliably build, test and deploy it faster that things would be manually. The overall concept of doing all this automatically is usually summarized as Continuous Integration Continuous Deployment. Here is a simple post that nicely explains these concepts.

If you don't want to read any more the tl;dr is this:

* Write code using AWS Cloud 9 
* Debug code using AWS XRay
* Store code using AWS Code Commit
* Build and test code using AWS Code Build
* Deploy code using AWS Code Deploy
* Watch task progression at runtime from a single interface using AWS Code 
   Pipeline
* Use an integrated dashboard for all your tools including issue tracking using
   AWS Code Star.

If you're not familiar with Git, I'd strongly recommend reading a little about it before proceeding and playing with all these shiny new AWS tools. A great source is this chapter from the ProGit book. Once that's done, come back here. It's fine to read through this post as well, even without Git knowledge - it's just easier with that background knowledge.

Cloud 9 IDE

Once you have an idea in mind and want to write software to actualize it, you need a place to write it. A simple text editor works just fine, but as your programs get more complex an IDE is really helpful. A couple you might be familiar with are Eclipse and IntelliJ. However, since this post is about AWS, I must mention the Cloud9 IDE. It is a browser based IDE that gives you the familiar environment. I haven't played with it too much, but it's good to know there is a web-based option now.


XRay

This looks like a code-profiler to me. I did not use it so do not have much to say about it. But I'd think the way to use it, will be to write your code and use this to figure out which calls are really slow and see if you can optimize your code further. All the rest I did try out and can confirm they are all very cool tools. So read on.

Code Commit

Once you finish writing all your code, you need a place to store the code. This is where all the VCS come in. Git is what everyone use these days. The AWS equivalent of Git is CodeCommit. It's so similar that you do not need to learn any new commands. Once you've set your repository up, all the old Git commands work perfectly well. You can add files, commit them and push them to your Code Commit repository.

All you need to do is install Git on your machine, create a key pair and configure your IAM user to use this to authenticate to Code Commit. Clicking the "Connect" button inside the interface gives you instructions per platform if you get stuck.

The coolest thing here is that you can create triggers that'll run as soon as you push code to your repository. Maybe you want to build, test and deploy your code to your test environment as soon as every single commit is pushed. You can do that here by setting up a Lambda function that will be called as soon a commit is made. Which nicely flows into Code Build..

Code Build

Once you have a workflow going where you can write code in an IDE and push commits to a CodeCommit repository, the next step is to make sure that your code builds properly. This is where CodeBuild comes in. All you do is point CodeBuild to the Code Commit repository where you stored your code and tell it where you want to dump any output artifacts of the program (usually S3).

It supports branches too, so you can tell it which branch to pull code from in Code Commit. You select your runtime environment, which you need to build code in (Java/Python/whatever), configure a bunch of other options and then build your project. The result is whatever you get after you hit Code - Build in whatever IDE you use.

The big advantages here are that you do have to spend very little time configuring your software development environment. Also, like I touched upon a bit in the Code Commit section, you could have that Lambda function you wrote as a CodeCommit trigger automatically run Code Build against your code each time a commit is made.

Code Deploy

Once the code is compiled, tests are run and your entire project is built, the last step is usually to deploy it to a web server so your users can then access it. That's where Code Deploy comes in. You can configure it so it uses the build output (with a deployable project) and puts it onto every web server you want to have it on.

You have options of using a load balancer as well, if you want traffic to be evenly distributed. Once deployment is complete, the output should appear on all the servers in that group.

Again, remember you can further extend your Lambda function to build and deploy now as soon as a commit hits Code Commit. Pretty cool :)

Code Pipeline

Code Pipeline isn't something new but it certainly makes life much easier. It helps though if you understand the the 3 previous services I talked briefly about earlier - since the screens in Code Pipeline deal with these 3 services and ask you for input. So I'd recommend understanding those Code Commit, Code Build and Code Deploy really well before using Code Pipeline.

Pipeline basically is a wizard for the other 3 services. So it'll prompt you to tell it where your code is (Code Commit) , what to build it with (Code Build) and what to deploy it with (Coce Deploy). If you already have roles and resources set up successfully when you played with the other 3 services - this should feel very intuitive when you do it. A couple of great tutorials are here and here. Also, a nice writeup on how someone automated the whole process is here.

The coolest thing about Pipeline is that you can see everything, stage by stage and where each stage is once you create it. For example: Once your code is pushed to Code Commit (as usual) and you have the Pipeline dashboard open, you can actually see each stage succeeding or failing, after which you can troubleshoot accordingly.

CodeStar

Managers should love this one. I used it just a bit but it has this fantastic looking dashboard that gives you a unified view of every single service that you are using. So in short, it has links to C9, CC, CB, CD and CP. So if you didn't cheat and did everything step by step :) you should see all your commits, builds and pipelines by clicking on the buttons on the fancy dashboard that is CodeStar.

The additional feature here is integration with Jira and Github where you can see all your issues as well.

So in short CodeStar is a one stop shop if you've bought into the AWS development environment and want to be tied into it for years to come, while parting with your money bit by bit :)