S3 Cost Saving: Archiving Compressed S3 Data into Glacier


Amazon’s S3 has been a popular storage service for many years. One of its sought out features is the storage tiers allow you to move data over. Different tiers come with different benefits. We’ll be looking at two such tiers that we know as S3 standard and S3 Glacier. 

Moving data from standard to Glacier is common practice. For one, Glacier is the cheapest storage tier available and two, it’s the best archiving solution.


Glacier is used to store away data that you aren’t in any immediate need of. So let’s say you’ve got older records of customers or patients and you want to store them away for an indefinite amount of time. S3 Glacier is the perfect service for such scenarios. 


Glacier charges you in two ways. One for the data transfer and two for the fixed storage, which is billed monthly. One of the ways people try to further reduce the costs of archiving is by compressing the data. This process ensures lesser data being transferred and stored. The only problem is, compressing your data is a bit of a messy process. Glacier doesn’t have any inherent compressing feature. The common solution is to use the S3DataNode part of DataPipelines and its subsequent GZIP command to do the task. This is the messy part. You have to take your data out of S3, load it into a pipeline, run the command, push it to Glacier. All manually - it's quite a hassle.


We built out an automated no-code workflow to take the same process and push it all into one seamless flow of events that does these different tasks from the same place. With this workflow, compression of your data will be the ideal way you approach your archiving. You could potentially cut your costs with this neat method. You only need 1 workflow with 8 nodes to make this complex use case a reality. No coding, no configuring on the AWS Console, or anything else.


Why and how archiving can save costs


Archiving has always been a familiar concept ever since the storage of data was introduced to the world. The idea of it is to store data in a way that access to it cannot be frequent, originally because of the sheer number of records being archived. In Amazon’s case, Archiving is a cheaper alternative to storage simply because of the trade-off it put forward. 


This trade-off gives you more possibilities on how your data should be organized and how much you can reduce expenses. Your job is to pick the data for archival, this usually means personal records and older data sets. You could also run an analysis of the activity of your data to figure out which one’s worth shipping off to Glacier.

The most price difference comes due to cost of Api calls. S3 charges you per file moved from S3 to glacier. $0.05 per 1000 items. This means if you had lots of small files, let’s say 10kb per file, for 1 million files, glacier storage cost = 1gb = $0.004 per GB, while transfer = $50. So you can see, transferring small files is way costly than storage. So we can bundle these small files to create a zip even without compression and then move it to glacier.


How our workflow compresses data


Data compression is done by loading the collection of smaller S3 data onto a different bucket and into the data pipeline. It bundles small files into one large zip. Compression quality ranges from 0 to 9. We use zero here. Files like Jpeg or mp4 are already compressed to min size.
Text files and log files can be compressed. The workflow uses a bit of custom code for this specific process (since we’ve already created it, you can simply adopt it as a template). We also configure the pipeline on our workflow to enable the compression and then you just wait a while for it to happen. The process itself is no different to normal ZIP compression, we’re just enabling it on a cloud service.


Detailed remarks about the process will be explained in the next section. The compression of data reduces both the overall Glacier storage cost and the data transfer cost. 


Process


I’ll expand on the workflow and what each node does. For this process, we only employ 2 AWS services- S3 and Data Pipeline.





Step 1: Triggering the workflow

The trigger nodes determine what set of action leads to the workflow being activated. In this case, it’s a trigger from an external application. This could be a request from a web page or an app, etc.


Step 2: A Custom Code to collate the S3 Data

This node has its custom code present that does the job of collecting the S3 Data from your bucket and preparing it to be redirected. The sourceBucket is from where the data is taken and the targetBucket is where the data will be moved to.


Step 3: Create DataPipeline

This action node creates the data pipeline where the S3 data will be compressed.


Step 4: A custom code to push S3 Data into the pipeline


Step 5: Pipeline Definition

This node configures the compression of the S3 Data that is moved into the pipeline and ensure the transfer of it to S3 Glacier


Step 6: Pipeline Activation


Step 7: Delay

A 600-second delay is set to allow the data transfer to happen before the next node is activated.


Step 8: Delete DataPipeline

This action node deletes the data pipeline after the compression and archiving is successfully done.

Conclusion

We have many other use cases similar to this one that we solve with our workflows. Our goal with custom workflows is to give you the freedom to find your solutions to the many complications AWS put forth. If you have a particular use case you want to work on, you can sign up with us and try your hand it.


S3 Cost Saving: Archiving Compressed S3 Data into Glacier

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

AWS Use Case Files
Launch EC2 Instances with CloudFormation
CloudFormation is the gateway to Infrastructure-as-code for AWS users. Learn how you can deploy Cloudformation templates through Totalcloud workflows and increase your customization.
June 25, 2020
Hrishikesh
AWS Use Case Files
JIRA Triggered Cloud Management
What if cloud management were as easy as raising a JIRA ticket? Almost every DevOps team uses JIRA as a standard means of issue tracking & task management. It’s a given that it would be a seamless process if you could also integrate your cloud processes with it.
June 16, 2020
Hrishikesh
AWS Use Case Files
Totalcloud Launches New Temporary Rightsizing Feature
You can't always shut down your EC2 machine outside of business hours since some machines are needed up for longer periods. Totalcloud's new downgrade feature lets you optimize your costs by letting you downgrade your machines in a fixed schedule.
June 8, 2020
Hrishikesh
AWS Use Case Files
Creating a 3-tier Application With Totalcloud’s Code-Free Workflows
As part of a new request by a customer, we've developed a workflow to deploy 3-tier applications much faster. Utilising merely 3 workflows to achieve a result that would have you scripting and troubleshooting for hours. This post gives you an idea of how this workflow functions, the services being used, and how you can benefit from it.
June 2, 2020
Hrishikesh
AWS Tips & Tricks
Componentized Cloud Management: The way ahead for Cloud Automation
When something gets complex, our primary approach is to break it down — even cloud management. If you’re a part of a growing company that uses the cloud, you can see your infrastructure becoming more…
May 29, 2020
Sayonee
AWS Tips & Tricks
Cost Optimization with AWS Serverless Resource Scheduling
You must be aware of EC2 scheduling and its benefits on cost optimization. However, scheduling doesn't need to stop at just EC2 or RDS. There are plenty of other AWS serverless resources that can be scheduled to save costs. While the traditional way might be tedious, Totalcloud provides a safe alternative.
May 28, 2020
Hrishikesh