AWS Databases

Icon-Architecture/64/Arch_Amazon-RDS_64Created with Sketch.

I'll be briefly covering databases which AWS provide as well as the key features that each service offer. I've also listed ports at the end to be familiar with.
  • Database Types
  • Amazon RDS
  • Amazon Aurora
  • Amazon ElastiCache
  • Amazon DynamoDB
  • Amazon S3
  • DocumentDB
  • Amazon Neptune
  • Amazon Keyspaces
  • Amazon QLDB (Quantum Ledger Database)
  • Amazon Timestream

  • Database Types

  • Relational database management system (SQL / OLTP): RDS, Aurora - great for joins
  • NoSQL database - no joins, no SQL: DynamoDB (~JSON), ElastiCache (key / value pairs), Neptune (graphs), DocumentDB (for MongoDB), Keyspaces (for Apache Cassandra)
  • Object Store: S3 (for big objects) / Glacier (for backups / archives)
  • Data Warehouse: (SQL Analytics / BI): Redshift (OLAP), Athena, EMR
  • Search: OpenSearch (JSON) - free text, unstructured searches
  • Graphs: Amazon Neptune - displays relationships between data
  • Ledger: Amazon Quantum Ledger Database
  • Time series: Amazon Timestream

  • Amazon Relational Database Service (RDS)

    This service allows you to create a database in the cloud. You can choose from the following:
  • Microsoft SQL Server
  • MySQL
  • PostgreSQL
  • MariaDB
  • Oracle

  • This service is managed by AWS which means you won't be able to SSH into the instance but you do benefit from a list of services:
  • Automated provisioning, OS patching
  • Continuous backups and point in time restore
  • Monitoring dashboards
  • Read replicas for improved read performance
  • Multi AZ setup for disaster recovery
  • Maintenance windows for upgrades
  • Scaling capability
  • Storage backed by EBS

  • If you would like the option to have access to your RDS instance then there is 'RDS Custom' which allows you access to the underlying database and OS so you can configure and install patches yourself if that's a use case you require.

    Auto Scaling Storage

    This feature helps increase the storage on an RDS instance when it's running out of free space and it will do it automatically. You do have to set the 'Maximum Storage Threshold'. This feature can help with unpredictable workloads and supports all RDS database instances.

    Read Replicas

    RDS allows up to 15 read replicas within the same availability zone, across multiple availability zones or even cross region. It's also possible to take a replica read instance and make it the main RDS instance. The replication is ASYNC, meaning that the data will eventually be consistent. You can only query (SELECT) data from a read replica not do any manipulations such as INSERT, UPDATE, or DELETE queries.It's important to note that there is a network cost for transferring data into another availability zone; the only use-case where that doesn't apply is if it's within the same region and your transferring to a read replica instance.

    RDS Multi AZ

    Multi AZ is mainly used for disaster recovery. The application will read/write to the main RDS instance via one DNS name, and that instance will be making a SYNC replication, meaning a real time exchange of information to a standby instance in another availability zone. That means every change that the application is sending to the main instance, the main instance will have to update the standby instance. If there is a problem with the main instance then there will be an automatic failover to the standby instance. This failover could happen due to network issues or instance/storage failure, if any of these events occur the standby instance would be promoted to the main instance. It's possible to setup read replicas for Multi AZ.There isn't any downtime for this process to happen, nor you need to modify your application as RDS handles this process in the background when configured. Here is a brief explanation of what happens. MultiAZBackground

    RDS Backups

    RDS has the ability to backup instances either automatically or manually. You can do a full backup of the database daily with the ability to restore to any point in time from oldest to five minutes ago. Transaction logs are backed up by RDS every five minutes. There is a retention period up to 35 days for automatic backups but can last as long as you want if backed up manually. Automated backups can be disabled. Do note that a stopped RDS instance still costs money as you're still paying for the existing storage. If you plan on stopping it for a long period of time, you should snapshot and restore instead.

    RDS Restore

    RDS has the ability to restore an instance by creating a backup of the existing database which is stored in S3, that is then used on a new RDS instance running MySQL.

    RDS Proxy

    This allows apps to pool and share database connections already established. Instead of having individual applications connect to the RDS instance, they will instead connect to the proxy which will pool the connections together into less connections to the RDS instance. You would want to do this to improve the database efficiency by reducing the strain on the RDS resources such as CPU and RAM. This feature auto-scales and is multi-AZ so you won't need to manage capacity which in turn reduces the failover time by up to 66%. This feature enforces IAM authentication, so users can only connect to the RDS instance using the correct credentials, and it's never publicly available as it can only be accessed from a VPC. This supports RDS (MySQL, PostgreSQL, MariaDB, MSSQL Server) and Aurora (MySQL and PostgreSQL).

    Invoking Lambda from RDS & Aurora

  • Invoke Lambda functions from within your DB instance
  • Allows you to process data events from within a database
  • Supported for RDS for PostgreSQL and Aurora MySQL
  • Must allow outbound traffic to your lambda function from within your DB instance (Public, NAT GW, VPC Endpoints)
  • DB instance must have the required permissions to invoke the lambda function (Lambda Resource-based Policy & IAM Policy)


  • RDS Event Notifications

  • Notifications that tells information about the DB instance itself (created, stopped, started)
  • You don't have any information about the data itself
  • Subscribe to following event categories: DB instance, DB snapshot, DB Parameter Group, DB Security Group, RDS Proxy, Custom Engine Version
  • Near real-time events (up to 5 minutes)
  • Send notifications to SNS or subscribe to events using EventBridge


  • RDS Summary

  • Managed PostgreSQL / MySQL / Oracle / SQL Server / DB2 / MariaDB / Custom
  • Provisioned RDS instance Size and EBS Volume Type & Size
  • Auto-scaling capability for Storage
  • Support for Read Replicas and Multi AZ
  • Security through IAM, Security Groups, KMS, SSL in transit
  • Automated Backup with Point in time restore feature (up to 35 days)
  • Manual DB Snapshot for longer-term recovery
  • Managed and Scheduled maintenance (with downtime)
  • Support for IAM Authentication, integration with Secrets Manager
  • RDS Custom for access to and customize the underlying instance (Oracle & SQL Server)

  • Use case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactions

  • Amazon Aurora

    Aurora is a cloud optimized database which has significant performance improvements over RDS. It has a capacity of up to 128 TB (terabytes) and grows in increments of 10 GB (gigabytes). Aurora can have up to 15 replicas and it's failover is instantaneous (30 seconds) but it all comes at a cost as it's roughly 20% more than an RDS instance.

    Features of Aurora

  • Automatic fail-over
  • Backup and Recovery
  • Isolation and Security
  • Industry Compliance
  • Push-button Scaling
  • Automated Patching with Zero Downtime
  • Advanced Monitoring
  • Backtrack: restore data at any point in time without using backups

  • Aurora has high availability and reading scaling as it copies the data across three availability zones with six copies and the storage is striped across 100's of volumes every time you write to the database.

    Aurora DB Cluster

    Below shows how you would interact with an Aurora instance and how the clusters work. You will be given two endpoints, read and write, the write endpoint will always connect to the main instance which is the only instance to write to the storage, whereas the read endpoint connects to the read replicas.

    Aurora Custom Endpoints

    If you wish to run analytics on the database without effecting performance you can define a subset of Aurora instances to point towards a custom endpoint.

    Aurora Serverless

    Aurora Serverless is an automated database which auto scales based on usage. This could be good for infrequent, intermittent or unpredictable workloads.

    Aurora Global

    Global Aurora has cross region read replicas which is good for disaster recovery. It takes less than a second to replicate data into another region.

    Aurora Machine Learning

    You can integrate Aurora with machine learning services (AWS Sagemaker and AWS Comprehend) to make predictions with your applications via SQL. Good use cases for this would be to check for fraud detection and product recommendations.

    Aurora Backups

    Aurora has the ability to backup instances either automatically or manually. You can do a full backup of the database daily with the ability to restore to any point in time. There is a retention period up to 35 days for automatic backups but can last as long as you want if backed up manually. Automated backups can't be disabled.

    Aurora Restore

    Aurora has the ability to restore an instance by creating a backup of the existing database using Percona XtraBackup which is stored in S3, that is then used on a new Aurora cluster running MySQL.

    Aurora Cloning

    Another feature of Aurora is that you can clone an existing database cluster from an existing one. This is faster than snapshot & restore and uses copy-on-write protocol. This feature is useful for creating a staging environment from a production database without impacting the live service.

    RDS & Aurora Security

    Both RDS and Aurora have:
  • At-rest encryption: Main and replica encryptions use AWS KMS which must be defined at launch time otherwise the main instance and read replicas can't be encrypted.
  • In-flight encryption: TLS-ready by default, user the AWS TLS root certificates client-side.
  • IAM Authentication: IAM roles to connect to your database instead of username/password.
  • Security Groups: Control network access to your RDS/Aurora database.
  • No SSH available: Except on RDS Custom.
  • Audit Logs: Can be enabled and sent to CloudWatch Logs for longer retention.

  • Aurora Summary

  • Compatible API for PostgreSQL / MySQL, separation of storage and compute
  • Storage: data is stored in 6 replicas across 3 AZ - highly available, self-healing, auto-scaling
  • Compute: Cluster of DB Instance across multiple AZ, auto-scaling of Read Replicas
  • Cluster: Custom endpoints for writer and reader DB instances
  • Same security / monitoring / maintenance features as RDS
  • Backup & restore options:
  • Aurora Serverless: for unpredictable / intermittent workloads, no capacity planning
  • Aurora Global: up to 16 DB Read Instances in each region, less than 1 second storage replication
  • Aurora Machine Learning: performance ML using SageMaker & Comprehend on Aurora
  • Aurora Database Cloning: new cluster from existing one, faster than restoring a snapshot

  • Use case: same as RDS, but with less maintenance / more flexibility / more performance / more features

  • Elasticache

    ElastiCache is an in memory database which helps high performance and low latency (Redis or Memcached). This also helps makes applications stateless and reduces the load off of database for read intensive workloads. Like RDS and Aurora, AWS takes care of the OS maintenance/patching, optimisations, setup, configuration, monitoring, failure recovery and backups. Using ElastiCache does involve a lot of application code changes.

    Redis vs Memcached


    RedisMemcached
    Multi AZ with Auto-FailoverMulti-node for partitioning of data (sharding)
    Read Replicas to scale reads and have high availabilityNo high availability (replication)
    Data durability using AOF persistenceNon persistent
    Backup and restore featuresNo backups or restore
    Supports Sets and Sorted setsMulti-threaded architecture

    Cache Security

    Elasticache supports IAM Authentication for Redis - the IAM policies are only used for AWS API-level security.

    ElastiCache Summary

  • Managed Redis / Memcached (similar offering as RDS, but for caches)
  • In-memory data store, sub-millisecond latency
  • Select an ElastiCache instance type (e.g. cache.m6g.large)
  • Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)
  • Security through IAM, Security Groups, KMS, Redis Auth
  • Backup / Snapshot / Point in time restore feature
  • Managed and Scheduled maintenance
  • Requires some application code changes to be leveraged

  • Use case: Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL

  • Amazon DynamoDB

    Amazon DynamoDB is a fully managed NoSQL database service. It is designed to offer fast, consistent, and scalable performance for applications that require low-latency data access, even as they scale to handle millions of requests per second.
  • DynamoDB is made of Tables
  • Each table has a Primary Key (must be decided at creation time)
  • Each table can have an infinite number of items (rows)
  • Each item has attributes (can be added over time - can be null)
  • Maximum size of an item is 400KB
  • Data types supported are:
  • Scalar Types- Strings, Numbers, Binary, Null
  • Document Types- List, Map
  • Set Types- String Set, Number Set, Binary Set

  • Read/Write Capacity Modes

  • Control how you manage your table's capacity (read/write throughput)

  • Provisioned Mode (default)
  • You specify the number of reads/writes per second
  • You need to plan capacity beforehand
  • Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
  • Possibility to add auto-scaling mode for RCU & WCU

  • On-Demand Mode
  • Read/Writes automatically scale up/down with your workloads
  • No capacity planning needed
  • Pay for what you use, more expensive ($$$)
  • Great for unpredictable workloads, steep sudden spikes

  • DynamoDB Advanced Features

    DynamoDB Accelerator (DAX)

  • Fully-managed, Highly Available, Seamless in-memory cache for DynamoDB
  • Help solve read congestion by caching
  • Microseconds latency for cached data
  • Doesn't require application logic modification (compatible with existing DynamoDB APIs)
  • 5 minutes TTL for cache (default)


  • DynamoDB Accelerator (DAX) vs ElastiCache



    Stream Processing

  • Ordered stream of item-level modifications (create/update/delete) in a table
  • Use cases:
  • React to changes in real-time (welcome email to users)
  • Real-time usage analytics
  • Insert into derivative tables
  • Implement cross-region replication
  • Invoke AWS Lambda on changes to your DynamoDB table

  • DynamoDBKinesis Data Streams
    24 hours retention1 year retention
    Limited number of consumersHigh number of consumers
    Process using AWS Lambda Triggers or DynamoDB Stream Kinesis adaptorProcess using AWS Lambda, Kinesis Data Analytics, Kinesis Data Firehose, AWS Glue Streaming ETL...

    DynamoDB Streams



    DynamoDB Global Tables



  • Make a DynamoDB table accessible with low latency in multiple-regions
  • Active-Active replication
  • Applications can read and write to the table in any region
  • Must enable DynamoDB Streams as a pre-requisite

  • DynamoDB - Time To Live (TTL)

    Automatically delete items after an expiry timestamp.

  • Use cases: reduce stored data by keeping only current items, adhere to regulatory obligations, web session handling

  • DynamoDB - Backups for disaster recovery

  • Continuous backups using point-in-time recovery (PITR)
  • Optionally enabled for the last 35 days
  • Point-in-time recovery to any time within the backup window
  • The recovery process creates a new table

  • On-demand backups
  • Full backups for long-term retention, until explicitely deleted
  • Doesn't affect performance or latency
  • Can be configured and managed in AWS Backup (enabled cross-region copy)

  • DynamoDB - Integration with Amazon S3

  • Export to S3 (must enable PITR)
  • Works for any point in time in the last 35 days
  • Doesn't affect the read capacity of your table
  • Perform data analysis on top of DynamoDB
  • Retain snapshots for auditing
  • ETL on top of S3 data before importing back into DynamoDB
  • Export in DynamoDB JSON or ION format (data serialization language by amazon)

  • Import from S3
  • Import CSV, DynamoDB JSON, or ION format
  • Doesn't consume any write capacity
  • Creates a new table
  • Import errors are logged in CloudWatch Logs

  • DynamoDB Summary

  • AWS propriety technology, managed serverless NoSQL database, millisecond latency
  • Capacity modes: provisioned capacity with optional auto-scaling or on-demand capacity
  • Can replace ElastiCache as a key/value store (storing session data for example, using TTL feature)
  • Highly Available, Multi AZ by default, Read and Writes are decoupled, transaction capability
  • DAX cluster for read cache, microsecond read latency
  • Security, authentication and authorization is done through IAM
  • Event Processing: DynamoDB Streams to integrate with AWS Lambda, or Kinesis Data Streams
  • Global Table feature: active-active setup
  • Automated backups up to 35 days with PITR (point-in-time-recovery - restore to new table), or on-demand backups
  • Export to S3 without using RCU within the PITR window, import from S3 without using WCU
  • Great for rapidly evolve schemas

  • Use case: Serverless applications (small documents 100s KB), distributed serverless cache

  • Amazon S3

  • S3 is a key / value store for objects
  • Great for bigger objects, not so great for many small objects
  • Serverless, scales infinitely, max object size is 5TB, versioning capability
  • Tiers: S3 Standard, S3 Infrequent Access, S3 Intelligent, S3 Glacier + lifecycle policy
  • Features: Versioning, Encryption, Replication, MFA-Delete, Access Logs
  • Security: IAM, Bucket Policies, ACL, Access Points, Object Lambda, CORS, Object/Vault Lock
  • Encryption: SSE-S3, SSE-KMS, SSE-C, client-side, TLS in transit, default encryption
  • Batch operations on objects using S3 Batch, listing files using S3 Inventory
  • Performance: Multi-part upload, S3 Transfer Accelerator, S3 Select
  • Automation: S3 Event Notifications (SNS, SQS, Lambda, EventBridge)

  • Use case: static files, key / value store for big files, website hosting

  • DocumentDB

  • Aurora is an "AWS-implementation" of PostgreSQL / MySQL
  • DocumentDB is the same for MongoDB (which is a NoSQL database)
  • MongoDB is used to store, query and index JSON data
  • Similar "deployment concepts" as Aurora
  • Fully managed, highly available with replication across 3 AZ
  • DocumentDB storage automatically grows in increments of 10GB
  • Automatically scales to workloads with millions of requests per seconds

  • Amazon Neptune

    Amazon Neptune is a fully managed graph database service. It is designed to work with highly connected datasets, enabling you to efficiently store and navigate complex relationships within your data.
  • Fully managed graph database
  • Highly available across 3 AZ, with up to 15 read replicas
  • Build and run applications working with highly connected datasets - optimized for these complex and hard queries
  • Can store up to billions of relations and query the graph with milliseconds latency
  • Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

  • Streams

  • Real-time ordered sequence of every change to your graph data
  • Changes are available immediately after writing
  • No duplicates, strict order
  • Streams data is accessible in an HTTP REST API
  • Use cases:
  • Send notifications when certain changes are made
  • Maintain your graph data synchronized in another data store (e.g. S3, OpenSearch, ElastiCache)
  • Replicate data across regions in Neptune


  • Amazon Keyspaces (for Apache Cassandra)

  • Apache Cassandra is an open source NoSQL distributed database
  • A managed Apache Cassandra-compatible database service
  • Serverless, Scalable, highly available, fully managed by AWS
  • Automatically scale tables up/down based on the applications traffic
  • Tables are replicated 3 times across multiple AZ
  • Using the Cassandra Query Language (CQL)
  • Single-digit millisecond latency at any scale, 1000s of requests per second
  • Capacity: On-demand mode or provisioned mode with auto-scaling
  • Encryption, backup, Point-In-Time Recovery (PITR) up to 35 days

  • Use case: store IoT (internet of things) devices info, time-series data

  • Amazon QLDB (Quantum Ledger Database)

  • A ledger is a book recording financial transactions
  • Fully Managed, Serverless, Highly Available, Replication across 3 AZ
  • Used to review history of all the changes made to your application data over time
  • Immutable system: no entry can be removed or modified, cryptographically verifiable
  • 2-3 times better performance than common ledger blockchain frameworks, manipulate data using SQL
  • Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules

  • Amazon Timestream

  • Fully managed, fast, scalable, serverless time series database
  • A graph containing points with a time included
  • Automatically scales up / down to adjust capacity
  • Store and analyze trillions of events per day
  • 1000s times faster & 1/10th the cost of relational database
  • Scheduled queries, multi-measure records, SQL compatibility
  • Data storage tiering: recent data kept in memory and historical data kept in a cost-optimized storage
  • Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
  • Encryption in transit and at rest

  • Use case: IoT (internet of things) applications, operational applications, real-time analytics

  • List of Ports

  • FTP: 21
  • SSH: 22
  • SFTP: 22
  • HTTP: 80
  • HTTPS: 443
  • PostgreSQL: 5432
  • MySQL: 3306
  • Maria DB: 3306
  • Oracle RDS: 1521
  • MSSQL Server: 1433
  • Aurora (PostgerSQL): 5432
  • Aurora (MySQL): 3306