Showing posts with label rds. Show all posts
Showing posts with label rds. Show all posts

Tuesday, May 21, 2019

AWS - Database Services

RDS: AWS's relational database system which is basically hosting MySQL, PostGres, MSSQL, Oracle, Amazon's own AuroraDB and MariaDB, an open-source clone of MySQL. Applications that are on application servers at data centers or hosted in the cloud can both use RDS as a data source and customize the DB instance (the basic unit of RDS) with the hardware and memory that they want. The databases can all be administered using the respective clients. AWS Networking and Backups are integrated with RDS.

DynamoDB: AWS's NoSQL database which stores data in JSON key-value ("a" : "test") format. Instead of writing SQL queries like with a relational database, you write NoSQL queries that query JSON. It integrates with AutoScaling that changes the read and write capacity of the database, depending on request volume. It also integrates with KMS allowing you to encrypt data at rest on the fly. It claims to scale really well horizontally (throw more computers at the problem). DynamoDB also has a HTTP API that you can use to directly query it. As usual, the devil is in the details and it is probably not for everyone. There's a nice blog which has a cool flowchart about when one should and should not use DynamoDB.

Elasticache: This is an in-memory database that supports Redis and Memcached. The point of an in-memory DB is to increase the speed of resolution, so users do not have to wait longer to use services. In other words it is a layer of abstraction before the database. If a user's request can be served from Redis cache, it will be done - and done faster than a round trip to the database. Here is a link to a comparison between Redis and Memcached.

Neptune: This is a graphing database. It is largely useful when there are large sets of data that are related to each other. The inter-related data is stored in the database and users can query it using languages built specifically for graphing (Apache Tinkerpop Gremlin and Sparql). Its interesting that the smallest DB instance that you can provision from inside Neptune is db.r4.large (~16 GB RAM) - which by itself shows that this is a product used for very large data sets.

Redshift: This is AWS's enterprise data warehousing solution. In other words it means that it helps analyze petabytes (if you want) of data from a variety of sources such as S3, Glacier, Aurora and RDS. There's a lot of database design that's needed, so I'm guessing (do not know for sure) that things can get pretty complex, pretty soon. Once the data is inside a RedShift cluster (for example: copied from S3), you can run SQL queries against it and make complex queries against the cluster. If you don't have huge amounts of data you probably do not want RedShift.

DocumentDB: This is basically there so you can migrate all your MongoDB content to the cloud while continuing to use all the Mongo relevant clients and tools. All you then do is change the DB endpoint to point to the DocumentDB endpoint in the cloud. The cool bit here is you can autoscale the storage your DB needs and the read capacity (how many queries can you make) so large applications are easily served. This too has the smallest instance as a db.r5.large instance with 16 GB RAM. So that feels like this too is a production service and might be expensive for smaller loads. I don't know that for a fact though - so please do your testing :)

Friday, September 14, 2018

Sample Architecture - AWS Example

A quick post this time on how you can use the AWS CLI or SDK to create an entire network, without using the GUI wizards (which are great, but sometimes irritatingly slow :)).

Relevant code to do everything in this post and a bit more is all uploaded here.

First up, almost certainly you want a VPC, because some services are public and some are private. A VPC will help you separate these. So you use the CreateVPC call to create one.

Make sure you enable private DNS so your external clients can reach your private hosts.

Then you create public and private subnets, so you can put your public and private hosts into each of those.

Your public subnet needs an Internet gateway to talk to the Internet, so you create one and attach a gateway to it.

Once you have your VPC, subnets and Internet gateway ready you need to setup routes between them. The wizard would do this automatically but we have to do it manually. So you first create a route table for both subnets and add routes to each route table. Note here, that you don't need your private subnet hosts to talk to the Internet. If you do, for some reason you will need to create a NAT gateway in the public subnet and modify your routing table in the private subnet to send traffic to it.

Now everything is sort of setup. So you then think of access control everywhere. For starters you create a security group allowing only inbound SSH and HTTPS access for an EC2 instance in the public subnet and only MySQL access for an RDS instance in the private subnet.

Create a key pair (I reused an old one is this was just a test) so you can use it for your new EC2. Identify an AMI to run on your EC2 instance. I used the console for this but you can apparently use the CLI or the SDK to find this out if you want to.

Once that's done you launch an EC2 instance in the public subnet, with the SSH-HTTPS security group and your key pair. Make sure you assign it a public IP otherwise you won't be able to reach it. Login to the instance with your keypair and confirm access works.

Now you start thinking of things you want to keep in your private subnet. The 3 things I was working with were RDS so my EC2 could talk to it, Secrets Manager to store my RDS credentials and a Lambda function that is needed to rotate the credentials in SecretsManager. All of these should be in the private subnet.

A cool thing here is that you can create a private endpoint for SecretsManager so that all traffic to it is always over an AWS network and doesn't go to the Internet at all.

RDS only needs inbound access from EC2 and Lambda on port 3306. I'm not sure what SecretsManager needs but I gave it inbound 443 only (You should test this more). Lambda doesn't need any inbound access. Setup security groups similar to how you did it before.

Create a secret in Secrets Manager. Use a random name if you're testing, you can't reuse old names for a while, even if you have deleted the secret. This secret should contain all the information you need to connect to the RDS database and used when you actually create the database.

Create a DB subnet group, retrieve the secrets you stored earlier from secrets manager and the security group that you created earlier (*3306 inbound access*) and then create the actual RDS itself.

Once the database is created, the only task remaining is to create the Lambda function that will rotate the credentials for you in Secrets Manager.