Auto Remediation: Increase DB size when Disk Space Utilization Crosses 90%

A common occurrence in Instance management is the risk of overutilization of disk space. Several factors can cause an increase in Diskutilization to go over 90%. For example, user-initiated heavy workloads, analytic queries, prolonged deadlocks, and lock waits, multiple concurrent transactions, long-running transactions, or other processes that utilize CPU resources.


Over-utilized instances can incur several performance issues that later affect your budget. Having a simple, automated means of scaling the volume of your instances when necessary take off any management overhead from your side.


In 2017, AWS announced Elastic Volume to allow for changes in EBS volumes without any scheduling downtime. Elastic Volume became a standard for handling overutilization with its promise of automation. In response to increased utilization, you could increase your volumes if the current workload is mandatory. Using CloudWatch alarms and lambda scripts, you can activate elastic volume and change the EBS volumes.  


In most cases, companies employ multiple DevOps engineers to handle the incident response. Even with Elastic volume, you’re required to write & program the function in order to expand your volume. The possibility of code errors & delayed response is inevitable. Most organizations have a strict policy to follow when it comes to access control & authentication for such actions that alter your infrastructure. EBS volumes incur its own separate charges so manipulating it should be regulated to the right teams. Writing out specific permissions and roles would take time and eventually you would have to subject it to changes. 


But such event-based responses can be automated without writing a single line of code. Workflows essentially mimic human actions and enable you to chart out your flow exactly the way a DevOps engineer would remediate an error. 


In this use case, you’re instructing the workflow to increase DB size by 20GB when the disk space utilization crosses 90% (and triggers a CloudWatch alarm). Even if your CloudWatch Alarm alerts you of overutilization outside in the middle of the night, the workflow would have handled it for you before you even think of having to respond. Since it’s automated, the fix is executed immediately - eliminating any response time delays. 

The workflows are completely customizable - let’s say you have cross-border & large DevOps teams and need to take approvals prior to increasing EBS disk space - you can add a user approval condition. The setup requires no scripts and activation, authorization and functionality can all be configured consecutively in the same panel. 





Process

 

The workflow increases your EBS volume by 20GB, as a default value. This value can be altered depending on your workload demands. When a CloudWatch Alarm goes off and sends an SNS alert for high disk space utilization, the workflow is automatically triggered and executes the action. Like we’ve pointed out, you can switch the trigger from Alarm to any other external system, platform, or ticketing system such as JIRA. After the Workflow matches the instances to be modified, it requests for user approval. On receiving a green signal, it increases the EBS volume and sends an SSM command that will attach the new volume to its EBS, and inform the OS. 

The workflow has two primary steps being achieved with a total of 8 nodes. The first step is to filter out the right instance(s) using simple conditional operations. The second is to modify the volume and apply it to your instance.


Steps


Step 1: Trigger- The trigger for the workflow is the instance that fulfills the condition of over 90% DiskUtilization. 


Step 2: Resource- The entire instances are loaded.


Step 3: Filter- The instances that caused the workflow to trigger are loaded.


Step 4: Filter- The two previous sets of instances are compared to determine which instances should have their EBS volumes boosted.


Step 5: Custom Code- The custom code determines the 20GB increase and a message to be sent as notification


Step 6: User Approval- The Workflow action will be sent as a notification to the designated email.


Step 7: Action- The EBS volume is increased.


Step 8: Action- The SSM command is sent so that the increased volume is attached to the instance.


Conclusion

Cloud management can get exponentially easier when you harness the potential of workflows over scripting. Use Totalcloud to find creative solutions to common complicated use cases.

Auto Remediation: Increase DB size when Disk Space Utilization Crosses 90%

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

Cloud Computing
20 Cloud Influencers You Should Be Following in 2020
It’s important to follow the right individuals so that you remain on the loop and always find yourself learning things that you were unaware of. These thought leaders and influencers can only be the avenues by which you meet other interesting technologists.
September 23, 2020
Hrishikesh
Cloud Computing
Everything You Need To Know About Kubernetes Scheduler
When creating a Kubernetes cluster, scheduling the pod to an available node is an important component of the process. This component works under specific rules and technicalities that I’d like to explore in this article...
September 23, 2020
Hrishikesh
Cloud Automation
New In: No-code cloud management workflows for Azure, VMware & Private Cloud (in addition to AWS)
At TotalCloud, we’ve been enabling workflow-based cloud management for AWS to make it intuitive, accelerated, and no-code. Instead of programming cloud management use cases or depending on siloed solutions, we built out a platform that gives you building blocks to assemble any cloud management solution. 
September 4, 2020
Sayonee
Cloud Computing
List of Essential Kubernetes Tools
Kubernetes is a Container-as-a-Service with tons of unique tools to choose from. External tools play a role in integrating with different systems or maintaining control over the clusters you deploy. Manual health checks and troubleshooting is not ideal to keep a system in full health.This list of tools will provide ample support to your containers and have enough configuration to leave management flexible...
August 12, 2020
Hrishikesh
AWS Use Case Files
TotalCloud Inventory Actions: Giving a new meaning to Cloud Inventory
Learn how the TotalCloud Inventory Dashboard can become equivalent to your cloud provider’s SDK. Carry out any action on any discovered resource with Inventory Actions.
July 30, 2020
Sayonee
AWS Tips & Tricks
AWS Tutorial: Create an AWS Instance Scheduler with Terraform
Terraform is a popular IaaS tool used by many to create, update, and maintain their AWS architecture. If you use Terraform to provision your AWS architecture, you won’t be disappointed with our new AWS tutorial video.We provide you with the means to set up your own instance scheduler from Terraform...
July 20, 2020
Hrishikesh