The cloud industry is growing at a rapid pace, with AWS adding dozens of features and hundreds of new services at re:Invent every year. Cloud teams with complex workloads (>20 resources) currently use multiple tools with charts (2D interfaces) to monitor their AWS environment, and getting visibility across the entire deployment is difficult. This increases reaction time to key events which impact uptime, performance, security, and cost. This, in turn, adds to the TCO and brings down the ROI of moving to the cloud. As per Gartner, the global average cloud management & monitoring software spend is 5% of the total cloud spend itself. The total market size for monitoring and management is expected to reach USD $14B by 2020. While the containerization and the microservices movement is an attempt to simplify this, the pace of development within organizations itself has evolved. In this article, let’s observe some key trends and look at ways in which their impact can be mitigated.
Due to increase in the number of cloud vendors with comparable offerings, IT is increasingly becoming a buyer’s market. End-users of IT infrastructure, especially mid-market and enterprise players are increasingly adopting a multi-cloud approach. Their goal is to increase the use of existing infrastructures, deploy new capabilities at scale, reduce costs, streamline resources, and avoid cloud vendor lock-in aka “sticky services”. The cloud industry is witnessing a transformation similar to what we saw with Android smartphones- get the best parts from the best vendors and may the best utility win.
IMPACT: DevOps to DataOps — Hybrid systems increasingly require a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experiences globally, while responding to control changes and system specifications at an unprecedented speed.
With key technologies (cloud, AI/ML, IoT) becoming commoditized as APIs, building new tech is no longer the differentiator. IT and cloud were for the longest time considered a necessary evil and, decision-making was relegated to the CIO and CFO to figure out the best features per dollar configuration, as a part of a 2–5 year plan. However, in a recent study it is established that the strength of IT is not only a big indicator of organizational efficiency, but also an indicator of its ability innovate by reacting to feedback from customers, and consequently success. The culture of DevOps has moved entire tech teams from waterfall-style product management on the mainframe, to a mode of Continuous Integration and Continuous Delivery (CI/CD). With the rise of containers and microservices, companies are thinking of ways of re-organizing development teams, and the entire organization around it. As per a report in HBR, Cloud Native Startups have proved to incumbents across traditional sectors and Enterprise, that the key to successful product development is constant feedback from the end-users, and the ability rapidly deploy changes to the product.
IMPACT: CI/CD needs CM (Continuous Monitoring)– CI/CD relies heavily on automation scripts and workflows, which is fraught with multiple issues in the absence of CM or Continuous Monitoring of metrics and health of the system.
3D video games have been around for more than a decade now, but with VR and AR gaining momentum, average consumers are increasingly getting conditioned to 3D interfaces or isometric pseudo-3D interfaces. This allows them to cognitively better understand relationships, metrics in the context of the entities and objects, and diagnose anomalies. This especially true for the tech sector where end-users are more likely to have experienced video games before and are more than comfortable operating in a video game environment.
IMPACT: Multiple 2D dashboards = One 3D visualization: Several research papers have concluded that 3D visualization does a much better job of capturing huge quantities of data and simplifying it for the user. This has been tested in applications for sectors such as finance, where the volume and velocity or information is very high- much like DevOps.
“Given the resources of cloud computing,” Microsoft Research principal researcher, Richard Harper notes, “a two-dimensional desktop layout is no longer sufficient to capture or convey rich, real-time relationships between data, people, schedules, or places. Cloud computing calls for new interaction metaphors, and these metaphors necessitate new input-output technologies.”
An interesting validation of the thesis of 3D > 2D came recently when Rabobank, built a physical 3D model of its own organisation and supporting IT systems to help visualise improvements that can be made as it embarks on its digital transformation programme.
This is the first time an organisation has invested in visualising its own IT landscape in this way, says Hans Tesselaar, BIAN executive director. “Everyone who sees the model immediately recognises a lot of things. It provides great insight in the problems and issues of IT, in a way that a Board of Directors can also see that IT-related problems cannot be resolved in a day or so.”
Most IT software purchasing has moved out from the CIO’s office and into the hands of the business-unit heads. This has resulted in increasing the burden on vendors across the stack to be user-friendly and increased power to actual end-user of the application.
“The best interface is no interface” — In 2015, when Golden Krishna used this phrase in his new book, this approach popular amongst human-computer interaction professionals became mainstream. The internet revolution which was founded on the PC, and then the smartphone touchscreen, met with a new paradigm. No screens. I was a consumer internet startup founder in 2015, during the rise of smartphones, and every feature-based idea got its own mobile app. As we all eventually realized, consumers don’t need another interface for every use-case, leading to 1) the consolidation multiple utility and e-commerce apps, and 2) the rise of chatbots.
The Chatbot era was supported by the rise of commodity chatbot APIs like Api.ai, Chatfuel, Lex. However, while these reduced the cost to companies providing customer service- they weren’t particularly optimal for end users for two reasons — 1) Poor intent recognition leading frequent misclassifications and 2) It is easier to search, point and click than type out everything you need. The latter was mitigated to some extent by improvements in Speech Recognition, leading to the voice interface- aka Alexa & Siri. The battle of interfaces in the consumer world is not just limited to Voice, with the rise of Vision aka Camera as the platform — we have to see how the AR kit, will evolve our world of Mixed Reality.
While enterprise tech generally lags behind consumer tech in the adoption of new interfaces, we saw the consumerization of the Enterprise Platform through the likes of Slack and it’s marketplace. Today we have almost every major product team using Slack in one way or the other, and a number Slack bots which are awesome at integrating all types of updates in a single platform. However, how will this interface evolve over the next 10 years? Let’s have a look at the sheer number of DevOps tools in the market:
The above trends give rise to following key debates on the future of cloud interfaces, across 1) monitoring, 2) interaction and 3) collaboration.
In the world of DevOps, Slack is being prophesied as the future interface for CI/CD as well as monitoring and management. Slack, currently has hundreds of the top Developer tools and Security & Compliance apps to automate significant portions of your workflow. Slacks’ stated mission was to eliminate the need for email and switching other productivity and collaboration tools. However, this didn’t reduce the amount of cognitive effort required from a human beings. Emails are still around, and now there are team groups on WhatsApp. We have to process an increased volume (#channels) and variety of information (#gifs) being generated in a similar looking interface with no indicator of priority level or importance. Group chat apps like Slack are built for a specific kind of communication –one-line-at-a-time, real-time conversations. This form of communication is sometimes useful (e.g. cases of slack bots giving alerts and accepting commands), however, it cannot scale as your only interface for your tech stack.
For cloud monitoring and management, there are multiple layers of the stack which need to be taken care of, but broadly include infrastructure monitoring, application performance, and log analysis. Almost all of them rely on smart APIs, present in 2D interfaces, and include multiple charts and graphs. Below is an image of cloud monitoring interface in Grafana.
In the world of tech- the command-line has been the preferred interface for a majority of developers, with only a few of them occasionally using the AWS console and other 2D monitoring tools like Graphana, Prometheus or Zabbix. Some would attribute the love for the black terminal to sci-fi movies like the Matrix which glorify how these arcane interfaces which require someone with the know-how to specific “secret words” to get things done. Similar, dashboards like the one in the image above have no scope of taking actions from the same interface. However, there is one thing our users love more than the CLI- Real-Time Strategy (RTS) video games! RTS games like Dota, AoE, and many others have conditioned the tech users to an immersive world where you can detect and react in the same interface.
No-Interface aka Cloud Automation: The idea of the public cloud was to reduce the number of efforts needed to deploy something, but the pace of innovation at which AWS adds new features and functionalities — there is need to undertake basic automation. Most developers- do not prefer continuous monitoring and have setup CloudWatch events for any alerts and threats. Tools like Ansible, Chef, and Puppet are the current leaders in this space, but have had their own share of problems — refer this Hacker News thread. Containers, Kubernetes, and functions while avoiding the need for any of the detailed monitoring above, are not scalable for 70% of the cloud IT use-cases, in their present form, and have their own set of monitoring problems.
The problem with Cloud Automation is primarily due to the underlying complexity of most applications and IT systems. To borrow from the famous 2×2 matrix:
The known-knowns are your 2D graphs and charts, the Known-Unknowns are the ones you setup Chef or Puppet or CloudWatch, for alarms and alerts. But what about the Unknown-Unknowns?
In systems which are increasingly complex, and constantly changing their state thanks to CI/CD- you need a single-interface which augment your ability to diagnose a situation and take action. Below are examples of the types of actions you can take in the TotalCloud environment:
We had one level of evolution when both the data needed to make a decision and the actual tools of taking action came together. However, the second shift is harmonising data, actions, and people in the same interface, in real-time. Collaboration in the cloud is needed at multiple levels, during development, debugging, downtime, security threats and during audits, reporting and compliance. Currently, there is no tool which can give you a single view into all the moving parts in a single interface, as well as give you insights on compliance, and allow third parties to inspect the same. CIOs and CISOs are increasingly facing strict audits and compliance checks, however, there is no way to ensure audit and compliance on a real-time basis. Most of this is done filling a physical form, and most verifications are done taking a snapshot of the state during the inspection. This methodology is non-transparent, at risk of fraud and highly inefficient. Below are some examples of where collaboration between teams is needed:
Last month I joined TotalCloud (TC) — an AWS Technology Partner, building a 3D visualization platform to augment cloud intelligence. The 3D view allows you to visualize the relationship between resources, metrics, and costs, in real-time. Currently, this information is spread across multiple 2D graphs & charts (Eg: CloudWatch or Cost Explorer), 1D log files (Eg: CloudTrail). These are further spread across your cloud console, multiple API calls, and third-party monitoring tools.
TotalCloud is helping cloud engineers take faster decisions about their AWS cloud infrastructure using real-time 3D visualization. We help cloud engineers avoid analyzing multiple interfaces, containing complex logs & charts — by combining all these data-points in a single 3D-view. We make monitoring and management, intuitive and interactive- like playing a real-time strategy video game (think Age of Empires). We envision creating a marketplace based on our graph-based visual platform which allows any monitoring and management tool, to allow their users to visualize any cloud metric or resource in a single-3D view, and take action.
The core engine powering TotalCloud is our proprietary CloudGraph, built on top of a microservices architecture, using Lambda. This engine polls for any changes in a users’ AWS environment, parsing multiple APIs, including CloudWatch, and updates it in real-time for the user. The CloudGraph scales across new AWS services or features without the need for any hard-coding. The technical vision is to allow our customers to integrate any cloud monitoring tool into TotalCloud’s 3D environment, by integrating their APIs as a node in the CloudGraph- and giving them the ability to visualize any metric or resource, and take action on it.
Conservative IT buyers living in the data center era with legacy software have successfully started adopting the cloud, largely thanks to AWS having convinced buyers such Goldman Sachs and GE to start their journey to the cloud. However, the adoption of the cloud will slow down unless we make it user-friendly, by solving the above issues.
Editorial Note: This post is written by Aayush Srivastava, Ex-Amazon, two time startup founder.