As many organizations have discovered first-hand, the consequences of data loss can be downright devastating, often resulting in prolonged downtime, significant damage to credibility, and major financial losses, both direct and indirect. While AWS has been heralded as a safer, more resilient alternative to on-premise computing, organizations must still think about how they can protect their AWS resources against loss by implementing a sound backup strategy.
According to Amazon, AWS resources are all entities that an organization can work with, including EC2 instances, S3 buckets, and CloudFormation stacks. All AWS resources utilize a pay-as-you-go approach for pricing that’s similar to how utility companies charge for natural gas, water, and electricity.
One major advantage of this approach is that organizations pay only for the resources they really consume. However, this can quickly become a disadvantage in the context of data backups. When each and every backup costs a certain amount of money, it’s critical to avoid backing up data that don’t really need to be backed up.
The most important question all organizations should ask themselves when selecting AWS resources for backup is whether they need bare-metal restore capabilities or if the ability to only restore data is sufficient. Organizations that decide to back up only their data may be fine with using AWS Data Pipeline to routinely move S3 data to Glacier, while other organizations may be fine with something like EC2/RDS snapshots.
Beyond the scope of the backup, it’s also important to select an appropriate backup scheme. For example, the venerable Grandfather-Father-Son (GFS) backup strategy consists of three backup cycles, providing a satisfactory balance between data retention and cost. GFS allows organizations to rest assured, knowing they are protected even against any “acts of God” type of events that might challenge the fantastic durability of AWS.
Listed below are all AWS services (with their corresponding resources) that should be backed up:
Amazon Aurora is a relational database compatible with MySQL and PostgreSQL. It has been designed with the cloud in mind, combining the performance of enterprise databases with the cost-effectiveness of open-source solutions. While fault-tolerant by design, you can manually take a snapshot of the data in your Aurora DB cluster to retain it beyond the set backup retention period.
Launched in 2012, Amazon DynamoDB is a key-value and NoSQL document database that’s used by some of the largest companies in the world, including Lyft, Airbnb, and Redfin. It comes with a built-in automated on-demand backup, restore, and point-in-time recovery, making it very easy to create full backups of Amazon DynamoDB tables.
One of Amazon’s central services, Elastic Compute Cloud, better known simply as EC2, is a cloud-computing platform that provides secure, resizable compute capacity that can be obtained with minimal friction. EBS volumes should be regularly backed up using Amazon EBS snapshots, and critical applications should be deployed across multiple EC2 Availability Zones.
Designed for use with Amazon EC2, Amazon Elastic Block Store is a high-performance block storage service that supports a wide range of workloads, including big data analytics engines, enterprise applications, and relational as well as non-relational databases, just to give a few examples. EBS volumes can be distributed across multiple Availability Zones, and it’s also possible to back them up to Amazon S3.
Built on Apache Lucene and first released in 2010, Amazon Elasticsearch is an open-source, RESTful, distributed search and analytics engine used for anything from business analytics to full-text search to security intelligence to log analytics. One concept that is at the center of Elasticsearch is the Elasticsearch cluster, a collection of nodes that hold data and provide indexing and search capabilities across them. Elasticsearch clusters, including their settings, node information, index settings, and shard allocation, can be backed with Amazon Elasticsearch Service Index Snapshots.
Powering mission-critical workloads of many Fortune 500 companies, Amazon Redshift has established itself as a premier internet hosting service and data warehouse product thanks to its performance and scale. While reliability is another chief characteristic of Amazon Redshift, it’s still highly recommended to automatically and manually create snapshots of Redshift clusters, which are then stored in Amazon S3.
Amazon Relational Database Service, or just Amazon RDS, is a distributed relational database service that provides resizable capacity and cost-efficiency while automating many time-consuming administration tasks, such as backups. That said, it’s still recommended to manually take snapshots of RDS database instances and keep them for as long as needed.
Designed for 99.999999999% (11 9’s) of durability, Amazon S3 is an object storage service built on the same architecture Amazon uses for its global e-commerce network. Amazon S3 buckets can be effortlessly backed up to Amazon Glacier, which is Amazon’s data archiving and backup service that stands out with its ultra-affordable storage costs.
Deployed within an Amazon Virtual Private Network (VPC), Amazon Workspaces is a managed, secure cloud desktop service that helps organizations eliminate many administrative tasks associated with the management of Windows and Linux desktops. Individual user volumes are backed up automatically every 12 hours, but it’s best to also enable Amazon WorkDocs Sync on a WorkSpace to allow users to continuously back up a specific folder to Amazon WorkDocs.
There are several possible approaches for backing up AWS resources, with Amazon’s own backup service, AWS Backup, leading the way.
AWS Backup is a fully managed backup service that provides a policy-based solution that makes it easy to centralize and automate the backup of data across AWS services. AWS Backup makes it possible to back up and restore EFS file systems, DynamoDB tables, EBS volumes, RDS databases, and Storage Gateway volumes.
Just like other AWS services, AWS Backup utilizes a pay-as-you-go approach for pricing, charging customers on a per-GB basis. For the first backup of an AWS resource, a full copy of your data is saved, but only the changed part of the AWS resource is saved for each incremental backup.
To back up AWS resources using AWS Backup:
Amazon describes the whole process in detail in its Developer Guide.
Before Amazon rolled out AWS Backup, many AWS users were taking advantage of the backup capabilities of AWS Lambda, an event-driven, serverless computing platform provided by Amazon. Thanks to AWS Lambda, it’s possible to run backup procedures based on trigger events from other AWS services.
A backup procedure can be triggered when someone writes something to an S3 bucket or when a certain amount of time passes since the last backup. Arguably the biggest downside of AWS Lambda is its relatively steep learning curve and the fact that a simple mistake in a backup script can render the entire backup useless and lead to a large bill from Amazon.
Many providers of backup services have developed in-cloud backup solutions for AWS. Such solutions boast a number of features not supported by AWS Backup, including the ability to backup AWS resources to competing clouds, perform image-based, incremental, and application-aware backups, effectively use storage space with global data deduplication, compression, and exclusion of swap files, and more.
Included among the major providers of in-cloud backup solutions are Veeam, Commvault, Druva, and Acronis. Choosing an independent in-cloud backup solution can prevent vendor lock-in, but the cost of backup will increase.