Auto Remediation: Increase DB size when Disk Space Utilization Crosses 90%

A common occurrence in Instance management is the risk of overutilization of disk space. Several factors can cause an increase in Diskutilization to go over 90%. For example, user-initiated heavy workloads, analytic queries, prolonged deadlocks, and lock waits, multiple concurrent transactions, long-running transactions, or other processes that utilize CPU resources.


Over-utilized instances can incur several performance issues that later affect your budget. Having a simple, automated means of scaling the volume of your instances when necessary take off any management overhead from your side.


In 2017, AWS announced Elastic Volume to allow for changes in EBS volumes without any scheduling downtime. Elastic Volume became a standard for handling overutilization with its promise of automation. In response to increased utilization, you could increase your volumes if the current workload is mandatory. Using CloudWatch alarms and lambda scripts, you can activate elastic volume and change the EBS volumes.  


In most cases, companies employ multiple DevOps engineers to handle the incident response. Even with Elastic volume, you’re required to write & program the function in order to expand your volume. The possibility of code errors & delayed response is inevitable. Most organizations have a strict policy to follow when it comes to access control & authentication for such actions that alter your infrastructure. EBS volumes incur its own separate charges so manipulating it should be regulated to the right teams. Writing out specific permissions and roles would take time and eventually you would have to subject it to changes. 


But such event-based responses can be automated without writing a single line of code. Workflows essentially mimic human actions and enable you to chart out your flow exactly the way a DevOps engineer would remediate an error. 


In this use case, you’re instructing the workflow to increase DB size by 20GB when the disk space utilization crosses 90% (and triggers a CloudWatch alarm). Even if your CloudWatch Alarm alerts you of overutilization outside in the middle of the night, the workflow would have handled it for you before you even think of having to respond. Since it’s automated, the fix is executed immediately - eliminating any response time delays. 

The workflows are completely customizable - let’s say you have cross-border & large DevOps teams and need to take approvals prior to increasing EBS disk space - you can add a user approval condition. The setup requires no scripts and activation, authorization and functionality can all be configured consecutively in the same panel. 





Process

 

The workflow increases your EBS volume by 20GB, as a default value. This value can be altered depending on your workload demands. When a CloudWatch Alarm goes off and sends an SNS alert for high disk space utilization, the workflow is automatically triggered and executes the action. Like we’ve pointed out, you can switch the trigger from Alarm to any other external system, platform, or ticketing system such as JIRA. After the Workflow matches the instances to be modified, it requests for user approval. On receiving a green signal, it increases the EBS volume and sends an SSM command that will attach the new volume to its EBS, and inform the OS. 

The workflow has two primary steps being achieved with a total of 8 nodes. The first step is to filter out the right instance(s) using simple conditional operations. The second is to modify the volume and apply it to your instance.


Steps


Step 1: Trigger- The trigger for the workflow is the instance that fulfills the condition of over 90% DiskUtilization. 


Step 2: Resource- The entire instances are loaded.


Step 3: Filter- The instances that caused the workflow to trigger are loaded.


Step 4: Filter- The two previous sets of instances are compared to determine which instances should have their EBS volumes boosted.


Step 5: Custom Code- The custom code determines the 20GB increase and a message to be sent as notification


Step 6: User Approval- The Workflow action will be sent as a notification to the designated email.


Step 7: Action- The EBS volume is increased.


Step 8: Action- The SSM command is sent so that the increased volume is attached to the instance.


Conclusion

Cloud management can get exponentially easier when you harness the potential of workflows over scripting. Use Totalcloud to find creative solutions to common complicated use cases.

Auto Remediation: Increase DB size when Disk Space Utilization Crosses 90%

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

Cloud Computing
List of Essential Kubernetes Tools
Kubernetes is a Container-as-a-Service with tons of unique tools to choose from. External tools play a role in integrating with different systems or maintaining control over the clusters you deploy. Manual health checks and troubleshooting is not ideal to keep a system in full health.This list of tools will provide ample support to your containers and have enough configuration to leave management flexible...
August 12, 2020
Hrishikesh
AWS Use Case Files
TotalCloud Inventory Actions: Giving a new meaning to Cloud Inventory
Learn how the TotalCloud Inventory Dashboard can become equivalent to your cloud provider’s SDK. Carry out any action on any discovered resource with Inventory Actions.
July 30, 2020
Sayonee
AWS Tips & Tricks
AWS Tutorial: Create an AWS Instance Scheduler with Terraform
Terraform is a popular IaaS tool used by many to create, update, and maintain their AWS architecture. If you use Terraform to provision your AWS architecture, you won’t be disappointed with our new AWS tutorial video.We provide you with the means to set up your own instance scheduler from Terraform...
July 20, 2020
Hrishikesh
Cloud Computing
Azure vs AWS: What you need to know
Companies that have jumped the gun with cloud migration during this time of crisis have committed a fatal mistake. The knowledge gap among businesses that seek to migrate is often underestimated, leading to devastating expenditures and operational inefficiencies...
July 15, 2020
Hrishikesh
AWS Use Case Files
Automating Auto Scaling Group Updates
AWS introduced autoscaling to make EC2 cluster scaling easier. We've seen users constantly make changes to their EC2s and put new policies into play. It’s important to update your Autoscaling group with the new instances so that it doesn’t ignore these machines.
July 6, 2020
Hrishikesh
AWS Use Case Files
Launch EC2 Instances with CloudFormation
CloudFormation is the gateway to Infrastructure-as-code for AWS users. Learn how you can deploy Cloudformation templates through Totalcloud workflows and increase your customization.
June 25, 2020
Hrishikesh