Friday, May 24, 2019

AWS - Networking Services

VPC: This is the DMZ/Vlan/Segmentation equivalent for the cloud. You can create a VPC, create subnets inside the VPC and then assign EC2 or RDS instances (or anything that needs an IP address) addresses inside individual subnets. You can then set ACLs on the VPC or individual subnets (in addition to security groups on the instances itself) to control inbound and outbound communication. You can have private and public (internet facing assets) subnets in a VPC. You can have these things called private VPC endpoints for public services such as KMS (cryptography), that ensure that traffic to KMS, instead of being sent over the Internet is sent exclusively over the AWS network. This is one of those services that you will almost certainly use, if you are on the cloud so do be familiar with it. :)

CloudFront: It is usually a common practice to use a CDN to cache static content in locations closest to the user (edge of the network) so round trips to the web server and DB server can be avoided. Now though, even dynamic content is retrieved by edge locations close to the destination servers and served to the end user. AWS Cloudfront claims to take a look at the requests coming in and making decisions on what dynamic content to serve to whom.

Cloudfront is also integrated with Web App Firewalls and DDOS protection services to protect services against malicious attacks. It additionally integrates with Lambda (run functions based on specific events), handles cookies (possibly for authenticated requests) and ACM so that a specific certificate is shown to the end user. Here is a good article about how CDNs work, along with a nice diagram at the bottom.

Route53:This is AWS's DNS service. It allows users to register their domains, configure DNS routes so that users can reach their application as well as check the health of web servers that are registered with Route 53.

API Gateway: This allows users to create HTTP REST & WebSocket APIs for any functionality they want to implement. You can integrate the API with HTTP (Query string parameters), call a Lambda function when an API is called, integrate it with other AWS services and then return a response to the end user.

Direct Connect:  This establishes a physical, link between the end user network and an Amazon location that supports Direct Connect. For this purpose fiber-optic cables that support either 1 Gbps or 10 Gbps must be used and the customer network devices must meet certain requirements. The main purpose of this service is to speed up data transfer between an on-premise network and an AWS service by bypassing a lot of the public Internet. This can be public like S3 or privately hosted inside a customer VPC. The other key factor is that this is apparently much cheaper than accessing S3 or VPCs over the Internet. Here's one such implementation.

App Mesh: Microservice architectures are quite common these days. The greater the number of microservices though, the greater is the management overhead from a monitoring perspective. Once there are applications already running somewhere (EC2 for example), App Mesh, built on Envoy can be configured such that traffic to every single micro-service of the application first passes through App Mesh. Rules configured on AppMesh can then determine the next steps to be taken. This is better than installing software on the OS of every microservice host and have them communicate to diagnose problems.

Cloud Map: This allows you to create user-friendly names for all your application resources and store this map. This can all be automated so as soon as a new container is created or a new instance is spawned due to more traffic, its IP address can be registered in CloudMap. When some micro-service needs to talk to another service, it'll look it up in CloudMap. This hence means that you no longer need to maintain a configuration file with locations of your assets - you can just look them up in CloudMap.

Global Accelerator:  Global accelerators once configured provide the user with a static IP address, mapped to several other servers. The traffic that hits the global accelerators will be redirected over routes in the AWS network to hosts close to the user's location and that have less load, so that the overall availability and performance of the applications improves. The aim is that traffic doesn't hit nodes that are not performing that well.

Thursday, May 23, 2019

AWS - Migration Services

Application Discovery Service: This one's to find out what offline servers you have and make a list of all that to then display them in the console online. For VMware VCenter hosts there's an AWS VM you have to install that'll do the discovery. Alternatively you can install an agent on every offline host you want tracked online. The last way is to fill out a template with a lot of data and import it into the console.

Database Migration Service: This is pretty self explanatory in that it allows you to migrate from an AWS data store to another AWS data store (support for Aurora, MySQL and plenty others) or to/from an on-premise instance. You can't do on-premise to on-premise :). The source database can apparently remain live throughout the migration which AWS claims (and probably is - idk) is a great advantage.

Server Migration Service: Just like the previous service helps migrate on-premise databases, this one helps migrate on-premise servers in VMWare, Hyper V and interestingly Azure to AWS. A VM is downloaded and deployed in VMware Vsphere. This then (when you say so) starts collecting the servers that you've deployed in VSphere and deploys it as Amazon Machine Images (AMI) to the cloud. These images can then be tested by creating new EC2 instances using these AMIs to see if they're functional before deploying them to production.

AWS Transfer for Sftp: This is quite simply just a managed Sftp server service that AWS has. The aim is to tempt people away from managing their own SFtp servers offline and migrate data to the cloud. It supports password and public key auth, and stores data in S3 buckets. All normal SSH/SFTP clients should work out of the box. Authentication can be managed either via IAM or via your own custom authentication mechanisms.

AWS Snowball: This is an appliance that you can ship to your data-center, copy all the data (upto 80 (Snowball) -100 (Snowball Edge) TB) to it over your local network and then ship the box back to AWS. AWS take that box and then import all the data into S3. The key win here is that you don't need to buy lots of hardware to do the transfer but can use AWS's own appliance instead. Also it saves a ton of bandwidth because you're doing local transfers instead of over the internet.

Datasync: Contrary to Snowball, Datasync transfers data to/from customer NFS servers to/from S3 or EFS over the network at high speeds using a custom AWS Datasync protocol (claim is upto 10 Gbps). Alternatively they can choose to go from NFS in the cloud to S3 also in the cloud. A DataSync agent is installed as a VSphere OVA in case of an on-premise server after which you add the various locations and configure them as sources or destinations. Finally a task starts and data is transferred between the 2 locations. Here's a nice blog demonstrating this.

AWS Migration Hub: This is sort of a 1 stop for starting off collection or data migration using the various other services that AWS has. Some of these were already mentioned above (Server and Database migration services). In addition there are some integrated migration tools (ATADATA ATAmotion, CloudEndure Live Migration etc - none of which I've heard of :)) that one can use when performing this migration. There is no additional cost to use this service - you pay for using the individual tools themselves.

Tuesday, May 21, 2019

AWS - Database Services

RDS: AWS's relational database system which is basically hosting MySQL, PostGres, MSSQL, Oracle, Amazon's own AuroraDB and MariaDB, an open-source clone of MySQL. Applications that are on application servers at data centers or hosted in the cloud can both use RDS as a data source and customize the DB instance (the basic unit of RDS) with the hardware and memory that they want. The databases can all be administered using the respective clients. AWS Networking and Backups are integrated with RDS.

DynamoDB: AWS's NoSQL database which stores data in JSON key-value ("a" : "test") format. Instead of writing SQL queries like with a relational database, you write NoSQL queries that query JSON. It integrates with AutoScaling that changes the read and write capacity of the database, depending on request volume. It also integrates with KMS allowing you to encrypt data at rest on the fly. It claims to scale really well horizontally (throw more computers at the problem). DynamoDB also has a HTTP API that you can use to directly query it. As usual, the devil is in the details and it is probably not for everyone. There's a nice blog which has a cool flowchart about when one should and should not use DynamoDB.

Elasticache: This is an in-memory database that supports Redis and Memcached. The point of an in-memory DB is to increase the speed of resolution, so users do not have to wait longer to use services. In other words it is a layer of abstraction before the database. If a user's request can be served from Redis cache, it will be done - and done faster than a round trip to the database. Here is a link to a comparison between Redis and Memcached.

Neptune: This is a graphing database. It is largely useful when there are large sets of data that are related to each other. The inter-related data is stored in the database and users can query it using languages built specifically for graphing (Apache Tinkerpop Gremlin and Sparql). Its interesting that the smallest DB instance that you can provision from inside Neptune is db.r4.large (~16 GB RAM) - which by itself shows that this is a product used for very large data sets.

Redshift: This is AWS's enterprise data warehousing solution. In other words it means that it helps analyze petabytes (if you want) of data from a variety of sources such as S3, Glacier, Aurora and RDS. There's a lot of database design that's needed, so I'm guessing (do not know for sure) that things can get pretty complex, pretty soon. Once the data is inside a RedShift cluster (for example: copied from S3), you can run SQL queries against it and make complex queries against the cluster. If you don't have huge amounts of data you probably do not want RedShift.

DocumentDB: This is basically there so you can migrate all your MongoDB content to the cloud while continuing to use all the Mongo relevant clients and tools. All you then do is change the DB endpoint to point to the DocumentDB endpoint in the cloud. The cool bit here is you can autoscale the storage your DB needs and the read capacity (how many queries can you make) so large applications are easily served. This too has the smallest instance as a db.r5.large instance with 16 GB RAM. So that feels like this too is a production service and might be expensive for smaller loads. I don't know that for a fact though - so please do your testing :)

AWS - Storage Services

S3: This is arguably (along with EC2) the most popular service that AWS offers. In short it allows users to store their files in it - behaving like an online file store. It has other uses such as hosting a website that has static content in it. Services very commonly store audit logs here and in short S3 is integrated with a large number of AWS services. S3 is a global service and has buckets whose names are unique - 2 users cannot create the same bucket. Files are stored inside the bucket and are called keys. For such a popular service - it does have fewer options (which are sufficient) via the AWS CLI. If you're starting to learn about AWS, this is the place to start.

EFS: This is an NFS file system that expands to the sizes of the files you are storing on it. You can use an NFS client on an EC2 Linux system to remotely mount and then read/write from/to the file system. They also have this interesting concept called lifecycle management which moves infrequently used files to a different class of EFS storage that costs less.

The GCP equivalent for this is FileStore.

FsX: This too in short is a file system that can be accessed remotely but it has been made keeping Windows systems in mind. Users who have Windows applications that need access to a lot of data over the network via SMB mapped network drives, are the targets. Linux systems too can access these mapped drives using a package called cifs-utils. It additionally also supports applications that use Lustre, a filesystem that targets applications that require a lot of computation.

S3 Glacier: If you have a large number of files that you do not want to delete (like old pictures) but do not use often S3 Glacier is the thing to use. The unit of storage for Glacier is a vault which is sort of equivalent to a bucket in S3. Only creation and deletion of vaults is through the console; everything else happens via the CLI or SDK. Additionally it also claims to be extremely low cost, which I'm not saying anything about :)


Storage Gateway: If there is an existing data-center where you already have a large number of applications that talk to databases, scaling this can become hard quickly if you have a lot of traffic. The AWS Storage Gateway is a virtual machine appliance (ESXi), an on-premise 1U hardware appliance (buy on Amazon) or even as an EC2 appliance. Once it's activated, the appliance will pick up all your data from the datacenter stores and put it on to S3, Glacier or EBS. Now you can just point your application to the new stores via an NFS client and it should work seamlessly. Here is a blog that walks you through a sample process. Additionally it allows backup applications to directly hit the gateway (configurable as a tape gateway) and backup directly to AWS S3 or Glacier.


AWS Backup: This service allows you to backup data from EC2, RDS and a few other services to S3 and then move that data to Glacier (I think) after a certain time. You can configure backup plans to decide what gets backed up (by tagging resources), when, whether its encrypted or not and when the backup is deleted. As of now it only supports a few services, but it's reasonable to assume that once it becomes more popular there'll be more services that are added to this.

Thursday, May 16, 2019

AWS - Compute - Container Services

Here is an image from the Docker website that describes how containers work.



Teams are increasingly building their workflows around Docker containers. Amazon has a few services that make this easier. This post briefly discusses each of these services.

ECR: This is a repository of pre-built images that you can build on your machine and then upload to AWS. So for example: You can build a Ubuntu image with a LAMP stack and any other custom packages and upload it to ECR. When other AWS services need to use that image for some other purpose, it is easily available.

ECS: Once the Docker images you built earlier are uploaded to ECR, one can use these images on EC2 instances to perform whatever computing tasks were specific to that container. This is where ECS comes in. Users can direct ECS to run specific containers that it then picks up, identifies EC2 instances they can be run on (creates a cluster of these) and then does so.

Once the cluster is ready, a task definition needs to be created. This defines how the containers are run (what port, which image, how much memory, how many CPUs and so on). When the task definition is actually used, a task is created and run on the cluster of EC2 images (each is called an ECS container instance) that were originally created.

An ECS agent is additionally installed on each ECS container instance that communicates with the AWS ECS service daemons itself; these agents respond to start/stop requests made by ECS.

The equivalent product on GCP is Kubernetes.

EKS: Kubernetes on Google has an architecture where there is a Kubernetes master node (the controller) and a number of worker nodes (equivalent to ECS agents on Docker containers) that send information about the state of each job to the master. The master then (similar to ECS) sends information about its various tasks that are running, to the Kubernetes daemon itself which uses it for some controlling purposes. Here is a diagram that illustrates this:



EKS on Amazon allows the Kubernetes master to be configured inside the AWS environment and allow it to communicate with deployments elsewhere, while simultaenously interacting with ELB, IAM and other AWS services.

Batch: If one has a job that one wants to schedule and run periodically while automatically scaling up or down resources as and when a job completes or takes up more memory/resources - AWS Batch is a good idea. AWS Batch internally uses ECS and hence Docker containers on EC2/Spot instances to run the jobs. Here is a nice guide that goes into an example of using Batch in a bit more detail.

Tuesday, May 14, 2019

AWS - Compute Services

This blog summarizes some of the AWS Compute services. I deliberately do not cover the ones that deal with containers, as I plan to blog separately about those. I'm looking at Google Cloud side by side from now on so I'll keep updating these posts just to mention if there is an equivalent. When I get to Azure, I'll do the same there as well :)

EC2: EC2 is one of the most popular services that AWS has. It basically allows you to spin up virtual machines with a variety of operating systems (Linux, Windows and possibly others) and gives you a root account on it. You can then SSH into it using key authentication and manage the system. What you want to use it for is completely up to you: Host a website, crack passwords as a pen-tester, test some software or really anything else.

The GCP equivalent for EC2 is Compute Engine.

Lightsail: Lightsail is very similar to EC2 except it comes with pre-installed software such as Wordpress or a LAMP stack as well and you have to pay a little money to own the server. The plus here is that it's easier for users who are non-technical to use Lightsail, compared to EC2 where you have to do everything yourself. In other words it is Amazon's VPS solution.

Lambda: This is AWS's Function-as-a-Service solution. In other words you write code and upload it to Lambda. You don't necessarily have to worry about where you'll host your code and how you'll handle incoming requests. You can configure triggers in each of these other services and then have Lambda act when the trigger is activated. For example: You can create a bunch of REST APIs and have the back-end requests handled by a Lambda function, upload files to S3 and have something happen each time a specific file is uploaded or do more detailed log analysis each time an event is logged to Cloudwatch. Lambda is integrated with a large number of AWS services so it is well worth learning it and using it better.

The GCP equivalent for Lambda is Functions.

Elastic Beanstalk: If you have some code that you've built locally and want to quickly deploy it without worrying about the underlying infrastructure you'd use to do it and don't want to spend a lot of time tweaking it - Beanstalk is the way to go. You can for example choose Python as a runtime environment, upload your Python code and let AWS then take over. AWS will create roles, security groups and EC2 instances that are needed (among anything else) and deploy your application so it is then easily accessible. If you need additional components such as databases or want to modify the existing configuration, these can be added later to the environment.

The GCP equivalent for Elastic Beanstalk is App Engine.

Serverless Apps Repository: This is a large repository of applications that have been created by users and uploaded for use by the community. One can grab these applications and deploy it in one's own AWS account. The requisite resources are then created by deploying a SAM template. The applications can be used as is or modified/code-reviewed before actually using it. If you change your mind, you can delete the CloudFormation template - this will delete all the AWS resources that were created during deployment.