IT infrastructure broadly represents the ‘foundational’ elements that are constructed to build an IT service, which includes the hardware components of compute, network, storage, and the software components such as application platforms distributed in a multi-tiered model.
While IT infrastructure has been deemed as invisible, over the last few decades of the IT operations evolution, there is a specific effort put around making infrastructure more visible. The reason for that is hidden in the kind of IT services that this infrastructure was used to build.
In the mainframe era and the client-server era, during which ITIL emerged while web 1.0 was taking shape, the need to ascertain the ‘health’ of the infrastructure components became critical. Hence operating systems like Unix and Microsoft Windows introduced performance monitors with event generation capabilities and monitoring tools and platforms like HP, BMC and CA. They could use those events and correlate them with the intelligence they had gained from the so-called ‘service management’ platforms and their ‘configuration management database CMDB and configuration items CI’.
As the evolution continued, data centers evolved, large facilities were getting built and the infrastructure landscape started exploding as the web 2.0 adoption started with access from anywhere, dynamic content and commerce and collaboration. The IT operational data started exploding and controls of what must be monitored ‘at minimum’ or what kind of suppression techniques needed to be applied in the operational design.
The current IT operation design has thus become reactive due to its dependency on infrastructure monitoring. The ITIL processes are supporting that reactive design. Monitoring is done using alerts and thresholds, events are created and correlated using static CI data from CMDB. This has become a template based on which support staff and their skills are developed.
In the last decade, this foundation or IT infrastructure itself changed. It became dynamic with the advent of load balancers across all tiers of application fabric, public and private cloud, hyper-converged and software-defined infrastructure, including compute, storage and network (LAN and WAN) and more recently the containerization evolution.
A dynamic foundation that is designed for change cannot be monitored and managed by a static design-based methodology. This is where the need for making infrastructure invisible is arising.
To make infrastructure invisible, the first step is to move the monitoring to where the impact is first seen (effect) and in this case, the IT Services. This can be started in a simple way where an approach like monitoring the front-end, middleware, message bus, and database should become the source of events and correlation. A more complex but effective monitoring could be if we can relate to a business metric or KPI level, such as number of sales orders, invoices, orders fulfilled and so on.
When we make infrastructure invisible, it would enable operations to catch an incident happening in the application or business service. This can then be fine-tuned over a period of time by changing the KPI thresholds. This incident data can also become a rich source of mining to build prediction models.
One of the best and simple ways to monitor upstream environment is to set up synthetic transaction monitoring and set up events for performance metrics of those synthetics. These performance metrics, when used for event generation and correlation with the underlying factors, can help catch an event early on and predict failures.
This would mean that we need to adopt or learn to change our existing support workflows, known as operational method and corresponding support teams and their skills, which we call the operational model. Such method and model will represent and support ‘services’ and not ‘towers’ starting with service desk which would be business process aligned. The entire back end operations supporting such a service desk will be realigned accordingly. Thus, when the operational teams and processes are designed around the events that are generated and correlated at a ‘service level’, then the infrastructure events will become invisible and only become visible in the root cause identification process leading to an invisible infrastructure-led IT operations.
By Murthy Malapaka | CTO, Cloud and Infrastructure Services, North America, Wipro