By Joachim Herschmann, Senior Director Analyst at Gartner
A digital immune system (DIS) combines practices and technologies for software design, development, operations and analytics to mitigate business risks. Here’s what to know.
A robust digital immune system protects applications and services from anomalies, such as the effects of software bugs or security issues by making applications more resilient so that they recover quickly from failures. It can reduce business continuity risks created when critical applications and services are severely compromised or stop working altogether.
Enterprises face unprecedented challenges in ensuring resilient operating environments, accelerated digital delivery and a reliable end-user experience. The business expects to have the ability to react to market changes quickly and innovate at a fast rate. End users expect more than sound functionality — they want high performance and their transactions and data to be secure and satisfactory interactions.
A digital immune system combines a range of practices and technologies from software design, development, automation, operations and analytics to create a superior user experience (UX) and reduce system failures that impact business performance. A DIS protects applications and services to make them more resilient so that they recover quickly from failures.
During a recent Gartner survey about overcoming the barriers to digital execution, nearly half of the respondents (48%) stated that the primary objective of their digital investments is to improve the customer experience (CX). DIS will be critical to ensuring that CX isn’t compromised by defects, system failures or anomalies, such as software bugs or security issues.
Gartner expects that by 2025, organizations that invest in building digital immunity will increase customer satisfaction by decreasing downtime by 80%.
Six prerequisites for a strong digital immune system
When building digital immunity, start with a strong vision statement that helps to align the organization and smooth implementation. Then take into account the following six practices and technologies:
• Observability enables software and systems to be “seen.” Building observability into applications provides the necessary information to mitigate issues with reliability and resilience and — by observing user behaviour — improves UX.
• AI-augmented testing enables organizations to make software testing activities increasingly independent from human intervention. It complements and extends conventional test automation and includes fully automated planning, creation, maintenance and analysis of tests.
• Chaos engineering uses experimental testing to uncover vulnerabilities and weaknesses within a complex system. If used in preproduction environments, teams can safely master the practice in a nonintrusive and test-first manner — and then apply the lessons learned to normal operations and production hardening.
• Autoremediation focuses on building context-sensitive monitoring capabilities and automated remediation functions directly into an application. It monitors itself and corrects issues automatically when it detects them and returns to a normal working state without requiring the involvement of operations staff. It can also prevent issues by using observability in combination with chaos engineering to remediate a failing UX.
• Site reliability engineering (SRE) is a set of engineering principles and practices that focuses on improving CX and retention by leveraging service-level objectives to govern service management. It balances the need for velocity against stability and risk, and reduces the effort of development teams on remediation and tech debt, but allows for more focus on creating a compelling UX.
• Software supply chain security addresses the risk of software supply chain attacks. Software bills of materials improve the visibility, transparency, security and integrity of proprietary and open-source code in software supply chains. Strong version-control policies, the use of artefact repositories for trusted content and managing vendor risk throughout the delivery life cycle protect the integrity of internal and external code.
Focus on the optimization of the CX along with mitigating potential risks. Implementing a DIS requires an innovative mindset, but combining practices and technologies, ensures you can keep complex digital systems running even when they are compromised.
In short: