Artificial Intelligence for IT Operations

Summary

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances IT operations analytics.[1] AIOps[2][3] is the acronym of "Artificial Intelligence Operations".[4][5][6] Such operation tasks include automation, performance monitoring and event correlations among others.[7][8]

There are two main aspects of an AIOps platform: machine learning and big data. In order to collect observational data and engagement data that can be found inside a big data platform and requires a shift away from sectionally segregated IT data, a holistic machine learning and analytics strategy is implemented against the combined IT data.[9]

The goal is to enable IT transformation,[10] receive continuous insights which provide continuous fixes and improvements via automation. This is why AIOps can be viewed as CI/CD for core IT functions.[11]

Given the inherent nature of IT operations, which is closely tied to cloud deployment and the management of distributed applications, AIOps has increasingly led to the coalescence of machine learning and cloud research.[12][13]

Process edit

The normalized data is suitable to be processed through machine learning algorithms to automatically reduce noise and identify the probable root cause of incidents. The main output of such stage is the detection of any abnormal behavior from users, devices or applications.

Noise reduction can be done by various methods, but most of the research in the field points to the following actions:

  1. Analysis of all incoming alerts;
  2. Remove duplicates;
  3. Identify the false positives;
  4. Early anomaly, fault and failure (AFF) detection and analysis.[14]

Anomaly detection - another step in any AIOps process is based on the analysis of past behavior of users, equipment and applications. Anything that strays from that behavior baseline is considered unusual and flagged as abnormal.

Root cause determination is usually done by passing incoming alerts through algorithms that take into consideration correlated events as well as topology dependencies. The algorithms on which AI are basing their functioning can be influenced directly, essentially by "training" them.[15]

Use edit

A very important use of AIOps platforms is related to the analysis of large and unconnected datasets, such as the Johns Hopkins Covid-19's data published through GitHub.[16] The data in this example is pulled from a large number of un-normalized databases - aggregated data (10 sources), US regional data (113 sources) and Non-US data (37 sources), which are unuseable considering the needed emergency response time by the traditional analysis models.

Generally, the main areas of use for AIOps platforms and principles are[17]

References edit

  1. ^ Jerry Bowles (January 28, 2020). "AIOps and service assurance in the age of digital transformation". Diginomica.
  2. ^ "Best practices for taking a hybrid approach to AIOps". 7 June 2021. Retrieved Nov 11, 2022.
  3. ^ "Algorithmic IT Operations Drives Digital Business: Gartner - CXOtoday.com". Cxotoday.com. Archived from the original on January 28, 2018. Retrieved January 28, 2018.
  4. ^ "Market Guide for AIOps Platforms". Gartner. Retrieved January 28, 2018.
  5. ^ "Improve IT systems management productivity, application performance and operational resiliency with AIOps". IBM. Retrieved Nov 11, 2022.
  6. ^ "ITOA to AIOps: The next generation of network analytics". TechTarget. Retrieved January 28, 2018.
  7. ^ "An Introduction to AIOps". The Register. Retrieved January 28, 2018.
  8. ^ "AIOps - The Type of 'AI' with Nothing Artificial About It - Dataconomy". Dataconomy.com. 31 March 2017. Retrieved January 28, 2018.
  9. ^ "AIOps: Managing the Second Law of IT Ops - DevOps.com". devops.com. 22 September 2017. Retrieved 24 January 2018.
  10. ^ "What is AIOps or Artificial Intelligence for IT Operations. Top 10 Common AIOps Use Cases". Archived from the original on 2021-02-12.
  11. ^ Harris, Richard. "Explaining what AIOps is and why it matters to developers". appdevelopermagazine.com. Retrieved 24 January 2018.
  12. ^ Masood, Adnan; Hashmi, Adnan (2019), Masood, Adnan; Hashmi, Adnan (eds.), "AIOps: Predictive Analytics & Machine Learning in Operations", Cognitive Computing Recipes: Artificial Intelligence Solutions Using Microsoft Cognitive Services and TensorFlow, Apress, pp. 359–382, doi:10.1007/978-1-4842-4106-6_7, ISBN 978-1-4842-4106-6, S2CID 108316737
  13. ^ Duc, Thang Le; Leiva, Rafael García; Casari, Paolo; Östberg, Per-Olov (September 2019). "Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey". ACM Comput. Surv. 52 (5): 94:1–94:39. doi:10.1145/3341145. hdl:11572/253114. ISSN 0360-0300.
  14. ^ WISC.edu - International Conference on Service Oriented Computing
  15. ^ Machine Learning
  16. ^ Importing COVID-19 data into Elasticsearch
  17. ^ UPC.edu - Top 10 Artificial Intelligence Trends in 2019
  18. ^ a b c d e f g h i j k l m "Call For Papers". cloudintelligenceworkshop.org. Retrieved 2022-12-31.