High Significant Fault Detection in Azure Core Workload Insights

Authors

  • Pranay Lohia Microsoft
  • Laurent Boué Microsoft
  • Sharath Ranganath Microsoft
  • Vijay Agneeswaran Microsoft

DOI:

https://doi.org/10.1609/aaai.v38i21.30312

Keywords:

Distributed AI , Machine Learning, Real-Time Systems , Temporal and Geo/Spatial Reasoning , Track: Deployed Applications

Abstract

Azure Core workload insights have time-series data with different metric units. Faults or Anomalies are observed in these time-series data owing to faults observed with respect to metric name, resources region, dimensions, and its dimension value associated with the data. For Azure Core, an important task is to highlight faults or anomalies to the user on a dashboard that they can perceive easily. The number of anomalies reported should be highly significant and in a limited number, e.g., 5-20 anomalies reported per hour. The reported anomalies will have significant user perception and high reconstruction error in any time-series forecasting model. Hence, our task is to automatically identify 'high significant anomalies' and their associated information for user perception.

Published

2024-03-24

How to Cite

Lohia, P., Boué, L., Ranganath, S., & Agneeswaran, V. (2024). High Significant Fault Detection in Azure Core Workload Insights. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 22779-22787. https://doi.org/10.1609/aaai.v38i21.30312

Issue

Section

IAAI Technical Track on Deployed Highly Innovative Applications of AI