Saturday, June 1, 2019

AWS - Security, Identity and Compliance

This blog defines a number of services that are relevant to AWS security. It is recommended that you know all these services as well as possible.


IAM: This is the heart of all the authentication and authorization that AWS services perform. If there's one service you should learn in and out this is it. Admins can create IAM users and roles, associate access keys (for programmatic access) and assign permissions to each user and role. Developers can use the access keys to programmatically invoke all AWS services, subject to the permissions assigned to the user/role. Additionally almost all (if not all) services create service-linked roles and assume IAM roles to perform operations in another service. Here is one such example. It is possible to use IAM to control access by a user to an entire service, specific APIs in a service or in many cases to specific resources as well.

Resource Access Manager: This is a service that allows one account to share resources with another account. The person who uses the shared services can perform actions similar to the owner of the resources. This helps reduce operational costs and also the overall attack surface (due to there being lesser things to manage). However there are only a few resources that can be shared as of now. Here is a walkthrough of this service by the ever helpful Jeff Barr.

Cognito: Cognito handles authentication for web and mobile applications. This is Amazon's user directory against which users can authenticate against a user-pool and obtain a user-pool token. Users can authenticate directly against user records stored in Cognito or use a SSO provider such as Google or Facebook to authenticate. The user-pool token is then integrated with an identity pool to obtain temporary AWS credentials using the STS service (which does not have a web console :)), transparent to the user. These credentials are then used to access AWS resources.

Secrets Manager: Like the name suggests, this stores credentials in a secure manner using KMS. Instead of hard-coding credentials in source code or configuration files, they can be stored in a vault such as Secrets Manager. Applications can retrieve these credentials at run-time to implement their functionality. Passwords, API keys or anything else that is considered a secret can be stored here. Automatic rotation of these credentials is also possible for RDS (MySql, PostGres and Aurora) database passwords.

Guard Duty: This is a security monitoring tool that continuously studies different logs (Cloudtrail, VPC etc) and generates security findings. Rules in Guard duty are part AWS, part from AWS's security partners and users can themselves customize Guard Duty rules to help detect threats.

Inspector: This involves installing an agent on an EC2 instance that then scans for open ports, verifies if an instance is vulnerable to known CVEs or verifies the system against CIS benchmarks. In short it is Amazon's vulnerability scanner (for a few items) aimed at helping EC2 instance owners secure their instances better. If you're managing your instances yourself, this seems like a useful service to have, if you're willing to pay the extra money :). Note that charges are per instance so if you only have a few servers, this could be pretty cheap.

Macie: This is a fancy (fairly pricey) tool that AWS has to detect data leakage of specific information from S3 buckets (upto 3 Tb in size). It classifies data based on numerous very specific rules (for e.g 1 and 2). It's also integrated with KMS which means there is a way to scan bucket content that is also encrypted.

Single-Sign-On: This allows AWS to function as a SSO solution while being tightly integrated with a number of AWS services. It integrates with AWS Directory so you can store all your user information there and authenticate against it. Additionally, if you authenticate successfully once it will allow you access to all of the services across all the AWS accounts, that are integrated with SSO. There's also a way to migrate your entire Active Directory to AWS so your users can continue using the same passwords. It's very similar to IAM, in a way - except that IAM is just for the single account. Here is a good article about how AWS SSO works.

Directory Service:  This is AWS's version of Active Directory. You can use SimpleAD which provides some features allowing easier management of EC2 instances. A more powerful version is the AWS Managed AD solution which allows you to access AWS apps, manage instances, use Azure Cloud apps, authenticate to an on-premise Active Directory over a VPN connection or share an AD domain hosted in another AWS account. You could also use an AD connector to allow EC2 instances to join an on-premise Active Directory. Users can then access the applications running on EC2 while authenticating against the on-premise Active Directory.

Certificate Manager:  This is AWS's certificate authority solution that helps users of applications use certificates to secure communication to them over TLS. You can create certs inside ACM or import certificates from outside. ACM is integrated with a few other common services (not all). The certificate's private key is stored securely and encrypted using KMS.

Key Management Service: This is the AWS key vault that securely stores data keys that are used to encrypt data. You can choose to let AWS create an AWS master key or create a customer managed key yourself. This key never leaves KMS. The master key encrypts the data key, which is the key that you actually use to encrypt/decrypt data outside KMS. You can choose to create the data key outside and import it to KMS, where the master key encrypts it. This is envelope encryption, which offers better security compared to single-key encryption. Almost every piece of data needs encryption these days and very predictably - a lot of them are integrated with KMS.

Cloud HSM: A HSM is a server that contains specialized hardware optimized to perform cryptographic operations. It helps with operations such as these. HSM's are costly - be sure you need them. In CloudHSM you create a cluster, and then add HSMs to the cluster to help with data redundancy. KMS additionally integrates with CloudHSM to help store keys even more securely.

WAF and Shield:  WAF is a web app firewall that monitors requests and allows/blocks traffic to the web-server that hosts content. You can choose which requests are acted upon. Shield helps protect applications against DDOS attacks. It has a Standard and Advanced mode (as the name suggests offers more protection). If you know what you're doing and don't have any fancy requirements, Shield Standard should be good enough.

SecurityHub: This is a one-stop to view the results of security scans done by Guard Duty, Inspector and Macie. Additionally, scan results from other partners are also listed here. It also claims to help businesses be compliant against CIS benchmarks.

Artifact: This is where you can go to look at all your agreements with AWS and manage them. Additionally, you can download numerous reports published by 3rd parties, verifying Amazon's compliance with numerous regulations.

Friday, May 24, 2019

AWS - Networking Services

VPC: This is the DMZ/Vlan/Segmentation equivalent for the cloud. You can create a VPC, create subnets inside the VPC and then assign EC2 or RDS instances (or anything that needs an IP address) addresses inside individual subnets. You can then set ACLs on the VPC or individual subnets (in addition to security groups on the instances itself) to control inbound and outbound communication. You can have private and public (internet facing assets) subnets in a VPC. You can have these things called private VPC endpoints for public services such as KMS (cryptography), that ensure that traffic to KMS, instead of being sent over the Internet is sent exclusively over the AWS network. This is one of those services that you will almost certainly use, if you are on the cloud so do be familiar with it. :)

CloudFront: It is usually a common practice to use a CDN to cache static content in locations closest to the user (edge of the network) so round trips to the web server and DB server can be avoided. Now though, even dynamic content is retrieved by edge locations close to the destination servers and served to the end user. AWS Cloudfront claims to take a look at the requests coming in and making decisions on what dynamic content to serve to whom.

Cloudfront is also integrated with Web App Firewalls and DDOS protection services to protect services against malicious attacks. It additionally integrates with Lambda (run functions based on specific events), handles cookies (possibly for authenticated requests) and ACM so that a specific certificate is shown to the end user. Here is a good article about how CDNs work, along with a nice diagram at the bottom.

Route53:This is AWS's DNS service. It allows users to register their domains, configure DNS routes so that users can reach their application as well as check the health of web servers that are registered with Route 53.

API Gateway: This allows users to create HTTP REST & WebSocket APIs for any functionality they want to implement. You can integrate the API with HTTP (Query string parameters), call a Lambda function when an API is called, integrate it with other AWS services and then return a response to the end user.

Direct Connect:  This establishes a physical, link between the end user network and an Amazon location that supports Direct Connect. For this purpose fiber-optic cables that support either 1 Gbps or 10 Gbps must be used and the customer network devices must meet certain requirements. The main purpose of this service is to speed up data transfer between an on-premise network and an AWS service by bypassing a lot of the public Internet. This can be public like S3 or privately hosted inside a customer VPC. The other key factor is that this is apparently much cheaper than accessing S3 or VPCs over the Internet. Here's one such implementation.

App Mesh: Microservice architectures are quite common these days. The greater the number of microservices though, the greater is the management overhead from a monitoring perspective. Once there are applications already running somewhere (EC2 for example), App Mesh, built on Envoy can be configured such that traffic to every single micro-service of the application first passes through App Mesh. Rules configured on AppMesh can then determine the next steps to be taken. This is better than installing software on the OS of every microservice host and have them communicate to diagnose problems.

Cloud Map: This allows you to create user-friendly names for all your application resources and store this map. This can all be automated so as soon as a new container is created or a new instance is spawned due to more traffic, its IP address can be registered in CloudMap. When some micro-service needs to talk to another service, it'll look it up in CloudMap. This hence means that you no longer need to maintain a configuration file with locations of your assets - you can just look them up in CloudMap.

Global Accelerator:  Global accelerators once configured provide the user with a static IP address, mapped to several other servers. The traffic that hits the global accelerators will be redirected over routes in the AWS network to hosts close to the user's location and that have less load, so that the overall availability and performance of the applications improves. The aim is that traffic doesn't hit nodes that are not performing that well.

Thursday, May 23, 2019

AWS - Migration Services

Application Discovery Service: This one's to find out what offline servers you have and make a list of all that to then display them in the console online. For VMware VCenter hosts there's an AWS VM you have to install that'll do the discovery. Alternatively you can install an agent on every offline host you want tracked online. The last way is to fill out a template with a lot of data and import it into the console.

Database Migration Service: This is pretty self explanatory in that it allows you to migrate from an AWS data store to another AWS data store (support for Aurora, MySQL and plenty others) or to/from an on-premise instance. You can't do on-premise to on-premise :). The source database can apparently remain live throughout the migration which AWS claims (and probably is - idk) is a great advantage.

Server Migration Service: Just like the previous service helps migrate on-premise databases, this one helps migrate on-premise servers in VMWare, Hyper V and interestingly Azure to AWS. A VM is downloaded and deployed in VMware Vsphere. This then (when you say so) starts collecting the servers that you've deployed in VSphere and deploys it as Amazon Machine Images (AMI) to the cloud. These images can then be tested by creating new EC2 instances using these AMIs to see if they're functional before deploying them to production.

AWS Transfer for Sftp: This is quite simply just a managed Sftp server service that AWS has. The aim is to tempt people away from managing their own SFtp servers offline and migrate data to the cloud. It supports password and public key auth, and stores data in S3 buckets. All normal SSH/SFTP clients should work out of the box. Authentication can be managed either via IAM or via your own custom authentication mechanisms.

AWS Snowball: This is an appliance that you can ship to your data-center, copy all the data (upto 80 (Snowball) -100 (Snowball Edge) TB) to it over your local network and then ship the box back to AWS. AWS take that box and then import all the data into S3. The key win here is that you don't need to buy lots of hardware to do the transfer but can use AWS's own appliance instead. Also it saves a ton of bandwidth because you're doing local transfers instead of over the internet.

Datasync: Contrary to Snowball, Datasync transfers data to/from customer NFS servers to/from S3 or EFS over the network at high speeds using a custom AWS Datasync protocol (claim is upto 10 Gbps). Alternatively they can choose to go from NFS in the cloud to S3 also in the cloud. A DataSync agent is installed as a VSphere OVA in case of an on-premise server after which you add the various locations and configure them as sources or destinations. Finally a task starts and data is transferred between the 2 locations. Here's a nice blog demonstrating this.

AWS Migration Hub: This is sort of a 1 stop for starting off collection or data migration using the various other services that AWS has. Some of these were already mentioned above (Server and Database migration services). In addition there are some integrated migration tools (ATADATA ATAmotion, CloudEndure Live Migration etc - none of which I've heard of :)) that one can use when performing this migration. There is no additional cost to use this service - you pay for using the individual tools themselves.

Tuesday, May 21, 2019

AWS - Database Services

RDS: AWS's relational database system which is basically hosting MySQL, PostGres, MSSQL, Oracle, Amazon's own AuroraDB and MariaDB, an open-source clone of MySQL. Applications that are on application servers at data centers or hosted in the cloud can both use RDS as a data source and customize the DB instance (the basic unit of RDS) with the hardware and memory that they want. The databases can all be administered using the respective clients. AWS Networking and Backups are integrated with RDS.

DynamoDB: AWS's NoSQL database which stores data in JSON key-value ("a" : "test") format. Instead of writing SQL queries like with a relational database, you write NoSQL queries that query JSON. It integrates with AutoScaling that changes the read and write capacity of the database, depending on request volume. It also integrates with KMS allowing you to encrypt data at rest on the fly. It claims to scale really well horizontally (throw more computers at the problem). DynamoDB also has a HTTP API that you can use to directly query it. As usual, the devil is in the details and it is probably not for everyone. There's a nice blog which has a cool flowchart about when one should and should not use DynamoDB.

Elasticache: This is an in-memory database that supports Redis and Memcached. The point of an in-memory DB is to increase the speed of resolution, so users do not have to wait longer to use services. In other words it is a layer of abstraction before the database. If a user's request can be served from Redis cache, it will be done - and done faster than a round trip to the database. Here is a link to a comparison between Redis and Memcached.

Neptune: This is a graphing database. It is largely useful when there are large sets of data that are related to each other. The inter-related data is stored in the database and users can query it using languages built specifically for graphing (Apache Tinkerpop Gremlin and Sparql). Its interesting that the smallest DB instance that you can provision from inside Neptune is db.r4.large (~16 GB RAM) - which by itself shows that this is a product used for very large data sets.

Redshift: This is AWS's enterprise data warehousing solution. In other words it means that it helps analyze petabytes (if you want) of data from a variety of sources such as S3, Glacier, Aurora and RDS. There's a lot of database design that's needed, so I'm guessing (do not know for sure) that things can get pretty complex, pretty soon. Once the data is inside a RedShift cluster (for example: copied from S3), you can run SQL queries against it and make complex queries against the cluster. If you don't have huge amounts of data you probably do not want RedShift.

DocumentDB: This is basically there so you can migrate all your MongoDB content to the cloud while continuing to use all the Mongo relevant clients and tools. All you then do is change the DB endpoint to point to the DocumentDB endpoint in the cloud. The cool bit here is you can autoscale the storage your DB needs and the read capacity (how many queries can you make) so large applications are easily served. This too has the smallest instance as a db.r5.large instance with 16 GB RAM. So that feels like this too is a production service and might be expensive for smaller loads. I don't know that for a fact though - so please do your testing :)

AWS - Storage Services

S3: This is arguably (along with EC2) the most popular service that AWS offers. In short it allows users to store their files in it - behaving like an online file store. It has other uses such as hosting a website that has static content in it. Services very commonly store audit logs here and in short S3 is integrated with a large number of AWS services. S3 is a global service and has buckets whose names are unique - 2 users cannot create the same bucket. Files are stored inside the bucket and are called keys. For such a popular service - it does have fewer options (which are sufficient) via the AWS CLI. If you're starting to learn about AWS, this is the place to start.

EFS: This is an NFS file system that expands to the sizes of the files you are storing on it. You can use an NFS client on an EC2 Linux system to remotely mount and then read/write from/to the file system. They also have this interesting concept called lifecycle management which moves infrequently used files to a different class of EFS storage that costs less.

The GCP equivalent for this is FileStore.

FsX: This too in short is a file system that can be accessed remotely but it has been made keeping Windows systems in mind. Users who have Windows applications that need access to a lot of data over the network via SMB mapped network drives, are the targets. Linux systems too can access these mapped drives using a package called cifs-utils. It additionally also supports applications that use Lustre, a filesystem that targets applications that require a lot of computation.

S3 Glacier: If you have a large number of files that you do not want to delete (like old pictures) but do not use often S3 Glacier is the thing to use. The unit of storage for Glacier is a vault which is sort of equivalent to a bucket in S3. Only creation and deletion of vaults is through the console; everything else happens via the CLI or SDK. Additionally it also claims to be extremely low cost, which I'm not saying anything about :)


Storage Gateway: If there is an existing data-center where you already have a large number of applications that talk to databases, scaling this can become hard quickly if you have a lot of traffic. The AWS Storage Gateway is a virtual machine appliance (ESXi), an on-premise 1U hardware appliance (buy on Amazon) or even as an EC2 appliance. Once it's activated, the appliance will pick up all your data from the datacenter stores and put it on to S3, Glacier or EBS. Now you can just point your application to the new stores via an NFS client and it should work seamlessly. Here is a blog that walks you through a sample process. Additionally it allows backup applications to directly hit the gateway (configurable as a tape gateway) and backup directly to AWS S3 or Glacier.


AWS Backup: This service allows you to backup data from EC2, RDS and a few other services to S3 and then move that data to Glacier (I think) after a certain time. You can configure backup plans to decide what gets backed up (by tagging resources), when, whether its encrypted or not and when the backup is deleted. As of now it only supports a few services, but it's reasonable to assume that once it becomes more popular there'll be more services that are added to this.

Thursday, May 16, 2019

AWS - Compute - Container Services

Here is an image from the Docker website that describes how containers work.



Teams are increasingly building their workflows around Docker containers. Amazon has a few services that make this easier. This post briefly discusses each of these services.

ECR: This is a repository of pre-built images that you can build on your machine and then upload to AWS. So for example: You can build a Ubuntu image with a LAMP stack and any other custom packages and upload it to ECR. When other AWS services need to use that image for some other purpose, it is easily available.

ECS: Once the Docker images you built earlier are uploaded to ECR, one can use these images on EC2 instances to perform whatever computing tasks were specific to that container. This is where ECS comes in. Users can direct ECS to run specific containers that it then picks up, identifies EC2 instances they can be run on (creates a cluster of these) and then does so.

Once the cluster is ready, a task definition needs to be created. This defines how the containers are run (what port, which image, how much memory, how many CPUs and so on). When the task definition is actually used, a task is created and run on the cluster of EC2 images (each is called an ECS container instance) that were originally created.

An ECS agent is additionally installed on each ECS container instance that communicates with the AWS ECS service daemons itself; these agents respond to start/stop requests made by ECS.

The equivalent product on GCP is Kubernetes.

EKS: Kubernetes on Google has an architecture where there is a Kubernetes master node (the controller) and a number of worker nodes (equivalent to ECS agents on Docker containers) that send information about the state of each job to the master. The master then (similar to ECS) sends information about its various tasks that are running, to the Kubernetes daemon itself which uses it for some controlling purposes. Here is a diagram that illustrates this:



EKS on Amazon allows the Kubernetes master to be configured inside the AWS environment and allow it to communicate with deployments elsewhere, while simultaenously interacting with ELB, IAM and other AWS services.

Batch: If one has a job that one wants to schedule and run periodically while automatically scaling up or down resources as and when a job completes or takes up more memory/resources - AWS Batch is a good idea. AWS Batch internally uses ECS and hence Docker containers on EC2/Spot instances to run the jobs. Here is a nice guide that goes into an example of using Batch in a bit more detail.

Tuesday, May 14, 2019

AWS - Compute Services

This blog summarizes some of the AWS Compute services. I deliberately do not cover the ones that deal with containers, as I plan to blog separately about those. I'm looking at Google Cloud side by side from now on so I'll keep updating these posts just to mention if there is an equivalent. When I get to Azure, I'll do the same there as well :)

EC2: EC2 is one of the most popular services that AWS has. It basically allows you to spin up virtual machines with a variety of operating systems (Linux, Windows and possibly others) and gives you a root account on it. You can then SSH into it using key authentication and manage the system. What you want to use it for is completely up to you: Host a website, crack passwords as a pen-tester, test some software or really anything else.

The GCP equivalent for EC2 is Compute Engine.

Lightsail: Lightsail is very similar to EC2 except it comes with pre-installed software such as Wordpress or a LAMP stack as well and you have to pay a little money to own the server. The plus here is that it's easier for users who are non-technical to use Lightsail, compared to EC2 where you have to do everything yourself. In other words it is Amazon's VPS solution.

Lambda: This is AWS's Function-as-a-Service solution. In other words you write code and upload it to Lambda. You don't necessarily have to worry about where you'll host your code and how you'll handle incoming requests. You can configure triggers in each of these other services and then have Lambda act when the trigger is activated. For example: You can create a bunch of REST APIs and have the back-end requests handled by a Lambda function, upload files to S3 and have something happen each time a specific file is uploaded or do more detailed log analysis each time an event is logged to Cloudwatch. Lambda is integrated with a large number of AWS services so it is well worth learning it and using it better.

The GCP equivalent for Lambda is Functions.

Elastic Beanstalk: If you have some code that you've built locally and want to quickly deploy it without worrying about the underlying infrastructure you'd use to do it and don't want to spend a lot of time tweaking it - Beanstalk is the way to go. You can for example choose Python as a runtime environment, upload your Python code and let AWS then take over. AWS will create roles, security groups and EC2 instances that are needed (among anything else) and deploy your application so it is then easily accessible. If you need additional components such as databases or want to modify the existing configuration, these can be added later to the environment.

The GCP equivalent for Elastic Beanstalk is App Engine.

Serverless Apps Repository: This is a large repository of applications that have been created by users and uploaded for use by the community. One can grab these applications and deploy it in one's own AWS account. The requisite resources are then created by deploying a SAM template. The applications can be used as is or modified/code-reviewed before actually using it. If you change your mind, you can delete the CloudFormation template - this will delete all the AWS resources that were created during deployment.

Tuesday, November 13, 2018

Content Security Policy - Quick Reference

This is a post to help me remember the various parts of CSP. The w3 specification for CSP is very readable - this is NOT a replacement for them - just something to help me remember the directives :)

Here's a nice link where you can generate your policy bit by bit.

Remember, by default content is allowed to run on the web - not blocked. If browsers made the defaults as 'block all', I'm willing to bet a lot of issues would go away.

Don't use:

- unsafe-inline: Allows inline JS (includes javascript:) to be run, this is where a ton of XSS happens
- unsafe-eval: Runs eval() on any JavaScript user input that is passed to it
- data: The 'data' tags allow content to be encoded as text/html or base64 and are another way of delivering inline content


Fetch-directives:

- child-src: Controls where <frame> and <iframe> can be loaded from
- connect-src: Controls where you can make direct connections to web-servers to (fetch(), WebSockets, XHR, EventSource)
- default-src: If the site uses JS and you haven't whitelisted any sites, it'll look at what's here and try loading a script from here. This is the default for every other fetch directive. Starting with 'default-src: None' is a good idea to start white-listing content
- font-src: Where can I load fonts from?
- frame-src: Where can I load Iframes from?
- img-src: Where can I load images from?
- manifest-src: Where can I load app manifests (metadata about a specific application) from?
- media-src: Where can I load audio, video and subtitles from?
- prefetch-src: Where can resources be prefetched from? This just means that some resources on the page will be 'processed' (DNS resolution for example) before they are actually requested
- object-src: Where do plugins (embed, object, applet) get loaded from
- script-src:
    * A list of white-listed sources for Javascript.
    * 'self' indicates that the browser should load scripts only from the site itself and nowhere else.
    * This controls inline scripts as well as XSLT stylesheets that can trigger script execution.
    * Adding 'nonce = really_random_nonce' or 'sha256-hash' can allow very specific inline scripts if there's no way to whitelist inline scripts
    * strict-dynamic accompanied by a nonce for a script, means that any scripts recursively called by that script are automatically trusted, without needing a nonce or hash themselves
- style-src: A list of whitelisted sources for CSS
- script-src-elem, script-src-attr, style-src-elem, style-src-attr all similar to script-src and style-src, except that they allow blacklisting specific tags instead. Not yet in browsers though, but here's a Google Group post.
- worker-src: Where can I load background Web Workers from?

Document directives:

- base-uri: Controls where relative URLs can be loaded from
- plugin-types: Restricts the types of plugins that can be loaded into the document
- sandbox: Controls what the IFrame that's embedded in your page can do. You can allow scripts, popups or forms for example

Navigation Directives:

- form-action: Submit forms only to specific whitelisted URLs. Useful when an attacker can actually inject their own form tags
- frame-ancestors: Defends against clickjacking attacks by limiting the websites that can actually frame the target site using frame, iframe, object, embed or applet tags
- navigation-to: Limit the websites that a page can navigate to

Reporting directives:

- report-to: If CSP is started in report-only mode, where do you send the report violations

Other important directives:

- upgrade-insecure-requests: Upgrade all requests made over HTTP to use HTTPS

- block-all-mixed-content: Ensure that all resources are requested over HTTPS, as long as the page is loaded over HTTPS
- require-sri-for: Subresource integrity for all scripts requested from a third-party-domain to detect tampering on the way

Other directives:

- referrer: Sends referrer only under certain conditions
- reflected-xss: Controls features in user-agent to prevent xss


Thursday, October 4, 2018

SSH certificate authentication

tl;dr:

* You can configure client-side and server-side authentication using SSH certificates with the existing openssh daemon.
* You never need to worry about MITM attacks on the client when connecting to the server the first time
* Significant decrease in management overhead of SSH keys on the server

If you have a remote server to manage and it's running Linux (or even Windows for that matter but that's beside the point) - it's very likely that there is an SSH daemon running on it. You use an SSH client to connect to it and perform administrative tasks on it. While doing so, you can use passwords (by default) or public key authentication which is a bit more secure as it takes out the password-brute-force attacks. It does mean though that there is some management overhead on both the client and the server side.

On the client, you have to add the host that you are connecting to your known_hosts file. So over time, you have a massive list of known_hosts with no clue about the purpose of each host. Similarly on every server, there is a huge authorized_keys file which has the client's public keys added to it. When you want to revoke a client key you have to go in and remove that client's key from this file on every server. When you want to not trust a server any more, you need to remove that entry from your known_hosts manually. This is something that can go wrong easily if you miss one server - so there's probably some automation that is probably required here that can make it more reliable.

Certificate-based-authentication goes one step further, where a client trusts any SSH server signed using an 'SSH-root-CA' and a server can in turn trust a client key only if it is signed by a 'user-CA'. There is a really nice post by Facebook where they automate this process and make it even less error-prone. Those posts do a good job of walking you through step-by-step but I did have trouble replicating it, so I'll do a quick summary of the exact steps here.

Server certificate authentication

1. Configure an SSH daemon on a server (Docker, EC2, VirtualBox doesn't matter - but ideally a separate host as it's the CA). Let's call it ca.
2. Generate an SSH keypair for the server CA.
3. Start a new server up. Let's call it host1. This too should run SSH. This is the server you want to login to and administer.
4. Generate an SSH keypair for host1 in /etc/ssh
5. Copy host1's public key onto the ca server. Sign host1's public key with ca's private key. This will create an SSH certificate.
6. Copy ca.pub and the certificate you just created from ca to /etc/ssh on host1.
7. Configure /etc/ssh/sshd_config to use the key you created in Step 4 as well as the certificate. This is done using the HostKey and HostCertificate directives.
8. Restart the SSH daemon or reboot your server to reload your SSH config so it uses the certificate
9. Configure the client machine (any machine apart from host1 and ca) to recognize the ca's public key using the @cert-authority directive. This is so you don't get a 'Should I connect? Yes/No' message the first time you connect to host1.

User certificate authentication

1. Generate an SSH keypair for the client. This is the userca.
2. Generate a second SSH keypair for the client. This is the key you use to connect to host1. Call it client.
3. Sign client with userca. This will generate a cert as well on the client.
4. Copy userca.pub to host1 and configure sshd_config using the TrustedUserCAKeys directive pointing to userca. This is so host1 recognizes that all user certs signed by this cert are to be accepted.

At this point, you should be able to login to host1 from client and never get a popup the first time I connect because I've explicitly trusted the server CA. It's also very cool that there is no need to do any more key management on any server, as long as you trust the user CA used to sign the user keys.

References:

Dockerizing an SSH service
Hardening SSH

Tuesday, September 18, 2018

AWS - Developer Tools

This post summarizes the AWS services that are used to help you write code and reliably build, test and deploy it faster that things would be manually. The overall concept of doing all this automatically is usually summarized as Continuous Integration Continuous Deployment. Here is a simple post that nicely explains these concepts.

If you don't want to read any more the tl;dr is this:

* Write code using AWS Cloud 9 
* Debug code using AWS XRay
* Store code using AWS Code Commit
* Build and test code using AWS Code Build
* Deploy code using AWS Code Deploy
* Watch task progression at runtime from a single interface using AWS Code 
   Pipeline
* Use an integrated dashboard for all your tools including issue tracking using
   AWS Code Star.

If you're not familiar with Git, I'd strongly recommend reading a little about it before proceeding and playing with all these shiny new AWS tools. A great source is this chapter from the ProGit book. Once that's done, come back here. It's fine to read through this post as well, even without Git knowledge - it's just easier with that background knowledge.

Cloud 9 IDE

Once you have an idea in mind and want to write software to actualize it, you need a place to write it. A simple text editor works just fine, but as your programs get more complex an IDE is really helpful. A couple you might be familiar with are Eclipse and IntelliJ. However, since this post is about AWS, I must mention the Cloud9 IDE. It is a browser based IDE that gives you the familiar environment. I haven't played with it too much, but it's good to know there is a web-based option now.


XRay

This looks like a code-profiler to me. I did not use it so do not have much to say about it. But I'd think the way to use it, will be to write your code and use this to figure out which calls are really slow and see if you can optimize your code further. All the rest I did try out and can confirm they are all very cool tools. So read on.

Code Commit

Once you finish writing all your code, you need a place to store the code. This is where all the VCS come in. Git is what everyone use these days. The AWS equivalent of Git is CodeCommit. It's so similar that you do not need to learn any new commands. Once you've set your repository up, all the old Git commands work perfectly well. You can add files, commit them and push them to your Code Commit repository.

All you need to do is install Git on your machine, create a key pair and configure your IAM user to use this to authenticate to Code Commit. Clicking the "Connect" button inside the interface gives you instructions per platform if you get stuck.

The coolest thing here is that you can create triggers that'll run as soon as you push code to your repository. Maybe you want to build, test and deploy your code to your test environment as soon as every single commit is pushed. You can do that here by setting up a Lambda function that will be called as soon a commit is made. Which nicely flows into Code Build..

Code Build

Once you have a workflow going where you can write code in an IDE and push commits to a CodeCommit repository, the next step is to make sure that your code builds properly. This is where CodeBuild comes in. All you do is point CodeBuild to the Code Commit repository where you stored your code and tell it where you want to dump any output artifacts of the program (usually S3).

It supports branches too, so you can tell it which branch to pull code from in Code Commit. You select your runtime environment, which you need to build code in (Java/Python/whatever), configure a bunch of other options and then build your project. The result is whatever you get after you hit Code - Build in whatever IDE you use.

The big advantages here are that you do have to spend very little time configuring your software development environment. Also, like I touched upon a bit in the Code Commit section, you could have that Lambda function you wrote as a CodeCommit trigger automatically run Code Build against your code each time a commit is made.

Code Deploy

Once the code is compiled, tests are run and your entire project is built, the last step is usually to deploy it to a web server so your users can then access it. That's where Code Deploy comes in. You can configure it so it uses the build output (with a deployable project) and puts it onto every web server you want to have it on.

You have options of using a load balancer as well, if you want traffic to be evenly distributed. Once deployment is complete, the output should appear on all the servers in that group.

Again, remember you can further extend your Lambda function to build and deploy now as soon as a commit hits Code Commit. Pretty cool :)

Code Pipeline

Code Pipeline isn't something new but it certainly makes life much easier. It helps though if you understand the the 3 previous services I talked briefly about earlier - since the screens in Code Pipeline deal with these 3 services and ask you for input. So I'd recommend understanding those Code Commit, Code Build and Code Deploy really well before using Code Pipeline.

Pipeline basically is a wizard for the other 3 services. So it'll prompt you to tell it where your code is (Code Commit) , what to build it with (Code Build) and what to deploy it with (Coce Deploy). If you already have roles and resources set up successfully when you played with the other 3 services - this should feel very intuitive when you do it. A couple of great tutorials are here and here. Also, a nice writeup on how someone automated the whole process is here.

The coolest thing about Pipeline is that you can see everything, stage by stage and where each stage is once you create it. For example: Once your code is pushed to Code Commit (as usual) and you have the Pipeline dashboard open, you can actually see each stage succeeding or failing, after which you can troubleshoot accordingly.

CodeStar

Managers should love this one. I used it just a bit but it has this fantastic looking dashboard that gives you a unified view of every single service that you are using. So in short, it has links to C9, CC, CB, CD and CP. So if you didn't cheat and did everything step by step :) you should see all your commits, builds and pipelines by clicking on the buttons on the fancy dashboard that is CodeStar.

The additional feature here is integration with Jira and Github where you can see all your issues as well.

So in short CodeStar is a one stop shop if you've bought into the AWS development environment and want to be tied into it for years to come, while parting with your money bit by bit :)

Friday, September 14, 2018

Sample Architecture - AWS Example

A quick post this time on how you can use the AWS CLI or SDK to create an entire network, without using the GUI wizards (which are great, but sometimes irritatingly slow :)).

Relevant code to do everything in this post and a bit more is all uploaded here.

First up, almost certainly you want a VPC, because some services are public and some are private. A VPC will help you separate these. So you use the CreateVPC call to create one.

Make sure you enable private DNS so your external clients can reach your private hosts.

Then you create public and private subnets, so you can put your public and private hosts into each of those.

Your public subnet needs an Internet gateway to talk to the Internet, so you create one and attach a gateway to it.

Once you have your VPC, subnets and Internet gateway ready you need to setup routes between them. The wizard would do this automatically but we have to do it manually. So you first create a route table for both subnets and add routes to each route table. Note here, that you don't need your private subnet hosts to talk to the Internet. If you do, for some reason you will need to create a NAT gateway in the public subnet and modify your routing table in the private subnet to send traffic to it.

Now everything is sort of setup. So you then think of access control everywhere. For starters you create a security group allowing only inbound SSH and HTTPS access for an EC2 instance in the public subnet and only MySQL access for an RDS instance in the private subnet.

Create a key pair (I reused an old one is this was just a test) so you can use it for your new EC2. Identify an AMI to run on your EC2 instance. I used the console for this but you can apparently use the CLI or the SDK to find this out if you want to.

Once that's done you launch an EC2 instance in the public subnet, with the SSH-HTTPS security group and your key pair. Make sure you assign it a public IP otherwise you won't be able to reach it. Login to the instance with your keypair and confirm access works.

Now you start thinking of things you want to keep in your private subnet. The 3 things I was working with were RDS so my EC2 could talk to it, Secrets Manager to store my RDS credentials and a Lambda function that is needed to rotate the credentials in SecretsManager. All of these should be in the private subnet.

A cool thing here is that you can create a private endpoint for SecretsManager so that all traffic to it is always over an AWS network and doesn't go to the Internet at all.

RDS only needs inbound access from EC2 and Lambda on port 3306. I'm not sure what SecretsManager needs but I gave it inbound 443 only (You should test this more). Lambda doesn't need any inbound access. Setup security groups similar to how you did it before.

Create a secret in Secrets Manager. Use a random name if you're testing, you can't reuse old names for a while, even if you have deleted the secret. This secret should contain all the information you need to connect to the RDS database and used when you actually create the database.

Create a DB subnet group, retrieve the secrets you stored earlier from secrets manager and the security group that you created earlier (*3306 inbound access*) and then create the actual RDS itself.

Once the database is created, the only task remaining is to create the Lambda function that will rotate the credentials for you in Secrets Manager.

Saturday, September 8, 2018

Confused Deputy

The confused deputy problem is one of the best named issues. Not for any deep philosophical reason, but just because it is truly confusing :). To me anyway, but then, most things are confusing to me, until I spend way-above-normal amounts of time re-reading and re-writing it in my own words. The link above (AWS) is an excellent resource, which I learnt most of it from, so go there first - and if you find that confusing, come over here and I'll try and explain it in my own words. As always, there's nothing wonderfully new here - just my attempt to make sure I remember, have fun writing and hopefully help anyone else along the way.

Let us just keep it simple here. The 3 people in question are Alice, Bob and Eve. Alice has software called MyBackup hosted on the cloud that lets you back up your images that are stored in the service called MyImages. Each time you use Alice's software you have to pay her 100$. Sure that's ridiculous, but stick with me. For some reason Bob thinks this is a great idea and pings Alice to use this service.

Alice creates an account and gives him a unique string called BobAliceBackup1987. She says that all Bob needs to do is to login when he wants to backup, paste the string into a text box on the website and click "Create Backup". This will automatically (details are not important here) let Alice into Bob's account and copy them all to her secret storage box that is very hard to hack and send Bob an Email when it is all done. Don't think about how lame this system is at this point :).

Eve now hears that Bob is using this service and likes it a lot. She subscribes to the service too and gets her key EveAliceBackup1991. Everything is good and everyone is happy.

One day Bob and Eve have a fight and stop talking to each other. Eve feels that Bob is wrong and wants to teach him a lesson. Frustrated, she logs into MyBackup to look at her backups. (WTF who even does this??). While typing in her "secret string" she suddenly wonders if she can make Bob spend his Britney Spears concert money on Backups instead. Can she predict Bob's key? Will Alice find out? Only one way to find out...

She guesses Bob's key (what a shock :/) and sends that key to Alice. Alice hasn't spent much time developing any kind of authorization models, so all she sees is a string come in and think - well there's another 100$ for me :). She just assumes (pay attention here) that whoever sends the string is the owner of the string and actually wants to back their images up. And she backs Bob's images up, 20 times in a row without thinking that something's wrong. Bob gets back at night (no there are no Instant Mobile alerts here for payment debits) and finds out he has backed his stupid car_bumper_dented images up 20 times. Alice is no help, she has proof he sent a string...and sure enough when Bob logs in and checks backup history he sees 20 requests too. Meanwhile Eve feels vindicated. Eventually she might get caught, eventually Bob might get his money back and eventually Alice will learn to write better software but that's beside the point. And yes, it's a made up example but one that hopefully helps you understand the point of the attack better.

In a nutshell, confused deputy occurs when a service with multiple users makes a decision based on user input that is predictable without asking for further authorization. In AWS world, the predictable input is a Role ARN that a service can assume in your account to do something in it. While it looks really big, it is not considered secret and if someone guesses it, they can make a service do things in your account - without your permission. Does that make sense? I hope so. But if not...

... go and read that excellent AWS blog again and see if it makes more sense.

Wednesday, August 22, 2018

Serverless Development

Just another post to solidify concepts in my mind. The Serverless word is often used these days in conjunction with development. All it really means is that you do not have to spend time configuring any servers. No Apache, Tomcat, MySQL. No configuration of any sort. You can just spend your time writing code (Lambda functions). Mostly anyway :)

The most common use of this philosophy is in conjunction with AWS. As in, you create a configuration file called serverless.yml that follows CloudFormation syntax. This basically means you create a config file offline with references to all the AWS resources that you think you will need (you can always add later) and then upload that file to CloudFormation.

CF then looks through the entire file and creates all those resources, policies, users, records, functions, plugins and in short whatever you mentioned there. You can now launch a client and hit the deploy URL and can invoke all the methods you wrote in your Lambda function.

There are some clear instructions on how to deploy a Hello World as well as how one can write an entire Flask application with DynamoDb state locally and then push it all online to AWS with a simple sls deploy command.

All you need to make sure is that you have serverless installed, your AWS credentials configured and access to the console (easy to verify things) and things will go very smoothly. Of course there are going to be costs to all this - so make sure you do all that research before getting seduced by this awesome technology :)

Sunday, August 19, 2018

Birthday Paradox

There's a million places the birthday paradox has been explained. I always forget it. So this time, I decided to write it down for my own reference, keeping just the salient points in mind.

To start, a year has 365 days (forget leap years for now). The chances your birthday is on say Jan 28th is 1/365. Hence the probability of it not being on Jan 28th is (1 - 1/365 = 364/365). Let's add your friend now. The chances of both of you not having a birthday on Jan 28th is (364/365)^2 (exponential). So for 253 people the chances of all of them not having a birthday on Jan 28th is (364/365)^253. Makes sense? If not, maybe read a bit of probability from some source you like and come back. There's zero shame in this btw, I needed to do it for what it's worth :).

Anyway, so now you think why did I pick 253 above? Well let's do a little math here. If there's 2 people in a room, how many pairs can we form where order doesn't matter? Just 1 pair right? What about 3 people (a,b,c)? How many pairs? 3 pairs (ab, bc, ac). With 4 people (a, b, c, d) it is (ab, ac, ad, bc, bd, cd). Right? So let's generalize this now so we can calculate it for a larger number, instead of 2, 3 or 4. That's where combinations come in - scroll down the link (just above) to get the formula - (23!) / ( 2!) * (23 - 2)! [It's 2! because a pair has 2 people and you're forming a group of 2]. Doing the math on that it becomes:

23 * 22 * 21! / 2! * 21! = 23 * 22 /2 = 23 * 11 = 253. See that number before? :)

Tying stuff back in, it means that if I have 23 people (including me) in a room, there are 253 ways in which pairs can be formed. And remember, the chances of any of them NOT sharing a birthday are (364/365)^253. It's not (364/365)^23. It's the probability ^ no_of_possible_pairs. Again, if this is going over your head - step back and read a bit of probability theory and come back once you're comfortable.

So if the number of ways there CANNOT be pairs is (364/365)^253 = 0.4995 by the way, the number of ways there CAN CAN be a match somewhere - meaning someone in the room shares a birthday is 1 - 0.4995 = 50.05. Meaning, there is just about a 50% chance that someone in a room will share a birthday if there's at least 23 people in a room. Not share a birthday with you - just share a birthday with anyone in that room. Make sense?

Now all that's fine but how does that matter in real life, keeping security in mind? I'm thinking of a couple of examples:

- If I use a 64 bit key to create a MAC, I'm thinking that there's 2^64 possibilities which is correct. But that doesn't mean someone needs to try all of them before a match is found. Because of the birthday paradox, it means the real number is sqrt(2^64) which is a number in the order of 2^32 which is way lesser.

- Digital signatures are another area. If I use an algorithm that is susceptible to collisions to create a signature, it means that an attacker can find a collision for my signature more easily and spoof it. Meaning they could change the message, fake a signature that looks the same as the original one, attach it to the message and no one will detect it.

The fix to all this is to ensure that you use hashing algorithms who give you a larger number of possibilities even after the square root is taken. Meaning for a SHA1 hash, which has a 160 bit output - after 2^80 possibilities one will start to see collisions. It looks like SHA 256 is safe for now :)

Thursday, August 16, 2018

Unit Tests - Why?


A unit test is basically testing one unit of code. One unit literally means one snippet of code. One snippet could mean 5 lines, a small function or in some cases even an entire small program. Usually though, in large corporate environments your code base is pretty massive so a good starting place is to think of a minimum of 1 unit test per function.

So you go to the function's code and see that it has 100 lines of code. The first 10 are just variable initialization, then there's a few calls to 3rd party libraries to get data, normalize the data and then log it. For e.g. Requests to REST APIs, convert the returned data into Json, add a new key with a timestamp to it and then log success or failure.

Finally once these calls succeed OR fail, your code returns True or False. So when you write a unit test, you're thinking (non intuitively) but you're thinking - "Let's assume that all these calls succeed and come to MY code, where I make the True/False decision. I want to make sure that my code reacts properly in either case.".

Which basically means, if there was a way to make all those calls (requests, json, log, add timestamp key) NOT run at all, in the first place and just provide a FAKE normalized, Json blob with a timestamp in it TO my True/False code - I'd do it. Because think of it, you're NOT testing the functionality of any of those calls - you're just interested in your code. So let's see a fake example here:

def code:
   a = 1
   b = 2
   c = 'https://fakesite.com'

   response = requests.get(a,b,c)
   json = json.convert(response)
   json['time'] = time.time()
   logger.info("Request completed")

   if json.size > 2 and json has.key('time'):
      return yay()
   else:
      return oops()

def oops():
    return 0

def yay():
    return 1

All you want to test is if oops() and yay() get called correctly. Nothing else. The end. So your yay() test (never mind how right now) looks like:

def test:
   response = patch('requests.get', 'fakerequests.get')
               #fake returns {}
   json = patch('json.convert', 'fakejson.convert')
               #fake returns {'a':1,'b':2}
   time = patch('time.time', 'faketime.time')
               # fake returns 1500
   patch('logger.info', 'fakelogger.info')
               # Doesn't matter
   json['time'] = time
               # Add key


   check code() == 1
               #That's what yay() will return and pass your test.

Remember, you could run your test without patching a single thing. As in, let all the 3rd party calls happen and test with real data, but that'll slow things down badly, specially if you have 1000s of tests to run.

And again, if you're still struggling (coz I did for a long time) it's a "unit" test - you isolate the bit you care about, assume everything around it works well, give it the input it needs to work well and then write your tests.

I hope that demystifies it a bit :).