A common occurrence in Instance management is the risk of overutilization of disk space. Several factors can cause an increase in Diskutilization to go over 90%. For example, user-initiated heavy workloads, analytic queries, prolonged deadlocks, and lock waits, multiple concurrent transactions, long-running transactions, or other processes that utilize CPU resources.
Over-utilized instances can incur several performance issues that later affect your budget. Having a simple, automated means of scaling the volume of your instances when necessary take off any management overhead from your side.
In 2017, AWS announced Elastic Volume to allow for changes in EBS volumes without any scheduling downtime. Elastic Volume became a standard for handling overutilization with its promise of automation. In response to increased utilization, you could increase your volumes if the current workload is mandatory. Using CloudWatch alarms and lambda scripts, you can activate elastic volume and change the EBS volumes.
In most cases, companies employ multiple DevOps engineers to handle the incident response. Even with Elastic volume, you’re required to write & program the function in order to expand your volume. The possibility of code errors & delayed response is inevitable. Most organizations have a strict policy to follow when it comes to access control & authentication for such actions that alter your infrastructure. EBS volumes incur its own separate charges so manipulating it should be regulated to the right teams. Writing out specific permissions and roles would take time and eventually you would have to subject it to changes.
But such event-based responses can be automated without writing a single line of code. Workflows essentially mimic human actions and enable you to chart out your flow exactly the way a DevOps engineer would remediate an error.
In this use case, you’re instructing the workflow to increase DB size by 20GB when the disk space utilization crosses 90% (and triggers a CloudWatch alarm). Even if your CloudWatch Alarm alerts you of overutilization outside in the middle of the night, the workflow would have handled it for you before you even think of having to respond. Since it’s automated, the fix is executed immediately - eliminating any response time delays.
The workflows are completely customizable - let’s say you have cross-border & large DevOps teams and need to take approvals prior to increasing EBS disk space - you can add a user approval condition. The setup requires no scripts and activation, authorization and functionality can all be configured consecutively in the same panel.
The workflow increases your EBS volume by 20GB, as a default value. This value can be altered depending on your workload demands. When a CloudWatch Alarm goes off and sends an SNS alert for high disk space utilization, the workflow is automatically triggered and executes the action. Like we’ve pointed out, you can switch the trigger from Alarm to any other external system, platform, or ticketing system such as JIRA. After the Workflow matches the instances to be modified, it requests for user approval. On receiving a green signal, it increases the EBS volume and sends an SSM command that will attach the new volume to its EBS, and inform the OS.
The workflow has two primary steps being achieved with a total of 8 nodes. The first step is to filter out the right instance(s) using simple conditional operations. The second is to modify the volume and apply it to your instance.
Step 1: Trigger- The trigger for the workflow is the instance that fulfills the condition of over 90% DiskUtilization.
Step 2: Resource- The entire instances are loaded.
Step 3: Filter- The instances that caused the workflow to trigger are loaded.
Step 4: Filter- The two previous sets of instances are compared to determine which instances should have their EBS volumes boosted.
Step 5: Custom Code- The custom code determines the 20GB increase and a message to be sent as notification
Step 6: User Approval- The Workflow action will be sent as a notification to the designated email.
Step 7: Action- The EBS volume is increased.
Step 8: Action- The SSM command is sent so that the increased volume is attached to the instance.
Cloud management can get exponentially easier when you harness the potential of workflows over scripting. Use Totalcloud to find creative solutions to common complicated use cases.