Tuesday, May 21, 2019

AWS - Storage Services

S3: This is arguably (along with EC2) the most popular service that AWS offers. In short it allows users to store their files in it - behaving like an online file store. It has other uses such as hosting a website that has static content in it. Services very commonly store audit logs here and in short S3 is integrated with a large number of AWS services. S3 is a global service and has buckets whose names are unique - 2 users cannot create the same bucket. Files are stored inside the bucket and are called keys. For such a popular service - it does have fewer options (which are sufficient) via the AWS CLI. If you're starting to learn about AWS, this is the place to start.

EFS: This is an NFS file system that expands to the sizes of the files you are storing on it. You can use an NFS client on an EC2 Linux system to remotely mount and then read/write from/to the file system. They also have this interesting concept called lifecycle management which moves infrequently used files to a different class of EFS storage that costs less.

The GCP equivalent for this is FileStore.

FsX: This too in short is a file system that can be accessed remotely but it has been made keeping Windows systems in mind. Users who have Windows applications that need access to a lot of data over the network via SMB mapped network drives, are the targets. Linux systems too can access these mapped drives using a package called cifs-utils. It additionally also supports applications that use Lustre, a filesystem that targets applications that require a lot of computation.

S3 Glacier: If you have a large number of files that you do not want to delete (like old pictures) but do not use often S3 Glacier is the thing to use. The unit of storage for Glacier is a vault which is sort of equivalent to a bucket in S3. Only creation and deletion of vaults is through the console; everything else happens via the CLI or SDK. Additionally it also claims to be extremely low cost, which I'm not saying anything about :)


Storage Gateway: If there is an existing data-center where you already have a large number of applications that talk to databases, scaling this can become hard quickly if you have a lot of traffic. The AWS Storage Gateway is a virtual machine appliance (ESXi), an on-premise 1U hardware appliance (buy on Amazon) or even as an EC2 appliance. Once it's activated, the appliance will pick up all your data from the datacenter stores and put it on to S3, Glacier or EBS. Now you can just point your application to the new stores via an NFS client and it should work seamlessly. Here is a blog that walks you through a sample process. Additionally it allows backup applications to directly hit the gateway (configurable as a tape gateway) and backup directly to AWS S3 or Glacier.


AWS Backup: This service allows you to backup data from EC2, RDS and a few other services to S3 and then move that data to Glacier (I think) after a certain time. You can configure backup plans to decide what gets backed up (by tagging resources), when, whether its encrypted or not and when the backup is deleted. As of now it only supports a few services, but it's reasonable to assume that once it becomes more popular there'll be more services that are added to this.

No comments: