The move to cloud-native observability requires technical as well as cultural changes

News Desk - 15/12/2022

By Gregg Ostrowski, executive CTO, Cisco AppDynamics

Organisations across all sectors are continuing to accelerate their move towards modern application development, built on cloud-native microservices. A cloud-first strategy is now widely accepted as the catalyst for rapid digital transformation and the key enabler for businesses to respond to constantly evolving customer and employee needs.

By re-imagining their applications in a hybrid or multi-cloud environment, organisations can embed greater flexibility and freedom in their application development processes and, ultimately, unleash innovation on a scale and speed not seen before.

However, as anyone who has worked in an IT department over the last 18-24 months will testify, managing availability and performance across cloud-native applications and technology stacks is proving to be a huge challenge for IT teams.

Traditional approaches to availability and performance were often based on physical or virtualised infrastructures. Flashback a decade, IT departments operated a fixed number of servers and network wires — they were dealing with ‘known-knowns’ and fixed dashboards for each layer of the IT stack. The introduction of cloud computing added a new level of complexity, and as a consequence, organisations found themselves continually scaling up and down their use of IT, based on real-time business needs.

While traditional monitoring solutions have adapted to accommodate rising deployments of cloud alongside traditional on-premise environments, the reality is that most were not designed to efficiently handle the dynamic and highly volatile cloud-native environments that we increasingly see today.

Scale is key — these highly distributed systems rely on thousands of containers and spawn a massive volume of metrics, events, logs and traces (MELT) telemetry every second. And currently, most technologists simply don’t have a way to cut through this crippling data volume and noise when troubleshooting application availability and performance problems caused by infrastructure-related issues that span across hybrid environments.

The need for cloud-native observability solutions

In response to this spiraling complexity, technologists need visibility across the application level, down into the supporting digital services (such as Kubernetes), and into the underlying infrastructure-as-code (IaC) services (such as compute, server, database, network) that they’re leveraging from their cloud providers. They also need visibility into user and business impact to prioritise their actions. This is critical for IT teams to truly understand how their applications are performing and where they need to focus their time.

Technologists are increasingly recognising the need for full-stack insights, as well as the ability to map relationships and dependencies across siloed domains and teams. Unsurprisingly, an AppDynamics report, The Journey to Observability, reveals that 55% of businesses in the United Arab Emirates (UAE) have now started the transition to full-stack observability, and a further 38% plan to do so during 2022.

From a technology perspective, there are several key criteria that IT leaders should be considering when looking at cloud-native observability solutions to ensure they are future-proofed for the next decade and beyond. They should be seeking out a new generation solution to observe distributed and dynamic cloud-native applications at scale; a solution that embraces open standards, particularly Open Telemetry; and that leverages AIOps and business intelligence to speed up identification and resolution of issues and enable technologists to prioritise actions based on business outcomes.

A new approach to management is required with cloud-native observability

But alongside best-in-class technology, the shift to cloud-native observability also requires significant cultural change within the IT department. And it’s vital that business and IT leaders recognise this and take actions to upskill their existing employees as well as attract and retain the skills and talent they need to achieve their goals.

The move towards a cloud-first strategy has seen the emergence of new teams within the IT department — such as Site Reliability Engineers (SRE), DevOps and CloudOps. And not only do these technologists have new and highly specialised skill sets, they also have very different mindsets and ways of working.

Traditionally, ITOps teams have always been focussed on minimising the risks brought about by change. Their mission has been to maximise up-time and unify technology choices, and they tend to take a rigid, centralised approach to digital transformation.

But when it comes to SREs and DevOps teams, it’s a very different story. These new teams value agility over control, and they give each team the autonomy to choose the best approach. They accept that there will always be massive complexity with cloud-native applications, but are happy to trade away some level of control in return for speed and innovation. They’re able to find peace in the chaos by adopting new solutions which allow them to cut through complexity and data noise and pinpoint what really matters.

Similarly, when considering digital transformation initiatives, these teams aren’t fazed by the scale and complexity involved in these programs. They aren’t encumbered by legacy technology or scarred by previous attempts to innovate. They embrace change, rather than resisting it, and they see transformation as an exciting and an inevitable part of business as usual.

These new cloud-native technologists refuse to conform to vendor lock-ins; they believe that they can deliver most value within dynamic technology ecosystems, with all teams having the freedom to select and work with best-in-class solutions for each project.

Finally, cloud-native technologists (be they SREs, DevOps or CloudOps) will evolve to have a very business-focused mindset. They will increasingly strive to view IT performance and availability through a business lens, and to understand how their actions and decisions can have the biggest impact on the business.

It is critical for business leaders to recognise the new mindsets and drivers of their cloud-native teams and to empower these technologists with the culture, support and solutions they need to deliver value. This requires the development of a strategy which enables these teams to operate in completely new ways, while also ensuring their existing teams are able to continue doing the vital work that they’re doing, monitoring large parts of their IT infrastructure.

Most importantly, IT leaders should take into consideration the mentioned cultural factors when selecting a cloud-native observability solution. This will ensure their SREs, DevOps and CloudOps teams have a solution which offers them the scalability, flexibility, and business metrics they need to perform to their full potential.
By taking a 360-degree approach, that considers both the technical and cultural needs of their IT teams, organisations can empower their technologists to cut through the complexity of cloud-native environments and deliver on the promise of this exciting new approach to application development.