Doc: Adding How-to Guide for Incident Manager (#16674)

* Doc: Adding Docs for Incident Manager

* Doc: Adding Docs for Incident Manager

---------

Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
Co-authored-by: Shilpa Vernekar <94032785+ShilpaVernekar@users.noreply.github.com>
This commit is contained in:
Prajwal214 2024-06-17 21:58:50 +05:30 committed by GitHub
parent 95d2d0f82f
commit 1f27bb7feb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
10 changed files with 164 additions and 0 deletions

View File

@ -0,0 +1,76 @@
---
title: How to work with Incident Manager
slug: /how-to-guides/data-observability/incident-manager/workflow
---
# How to Work with Incident Manager Workflow
## 1. Incident Dashboard
The Incident Dashboard is the central hub where all incidents are displayed. Users can filter incidents by various criteria to manage and prioritize them effectively.
### Filters Available:
- **Assignee:** View incidents assigned to specific team members.
- **Status:** Filter incidents based on their current status (e.g., New, ACK, Assigned, Resolved).
- **Test Cases:** Filter incidents associated with specific test cases.
- **Time:** Sort incidents by the time they were reported or last updated.
{% image
src="/images/v1.4/how-to-guides/observability/incident-manager-1.png"
alt="Incident Manager Dashboard"
caption="Incident Manager Dashboard"
/%}
## 2. Incident Status Change
Incident status can be updated to reflect the current stage of the incident resolution process. The owner of the incident has the ability to assign it to an appropriate assignee for further action.
### Steps to Change Incident Status:
1. Navigate to the Incident Dashboard.
2. Select the incident that needs a status update.
3. Choose the new status from the dropdown menu.
4. Assign the incident to the appropriate team member.
5. You can review the Test Case Details.
{% image
src="/images/v1.4/how-to-guides/observability/incident-manager-2.png"
alt="Incident Test Case Details"
caption="Incident Test Case Details"
/%}
{% image
src="/images/v1.4/how-to-guides/observability/incident-manager-3.png"
alt="Incident Status Change"
caption="Incident Status Change"
/%}
## 3. Incident Resolution
Once an incident has been resolved, it can be officially closed. Ensure to describe a Root Cause Analysis (RCA) in the comments to provide context and understanding of the resolution process.
### Steps to Resolve and Close an Incident:
1. Verify that all necessary steps to resolve the incident have been completed.
2. Describe the RCA in the resolution comments.
3. Change the status of the incident to 'Resolved'.
4. Confirm the closure to update the incident in the dashboard.
{% image
src="/images/v1.4/how-to-guides/observability/incident-manager-4.png"
alt="Incident Resolution"
caption="Incident Resolution"
/%}
## 4. Incident Activities
Each incident includes a detailed timeline where all relevant information is consolidated. This timeline provides a comprehensive view of the incident's lifecycle, including key events, RCA documentation, and closure updates.
### How to View Incident Activities:
1. Open the incident from the Incident Dashboard.
2. Navigate to the 'Incident' tab within the incident details.
3. Review the chronological events, RCA, and closure updates associated with the incident.
{% image
src="/images/v1.4/how-to-guides/observability/incident-manager-5.png"
alt="Incident Activities"
caption="Incident Activities"
/%}

View File

@ -0,0 +1,54 @@
---
title: Incident Manager
slug: /how-to-guides/data-observability/incident-manager
---
# Incident Manager
Using Incident Manager, managing data quality issues becomes streamlined and efficient. By centralizing the resolution process, assigning tasks, and logging root causes, your team can quickly address and resolve failures. The historical record of past incidents serves as a comprehensive guide, aiding your team in troubleshooting and resolving issues more effectively. All the necessary context is readily available, making it easier to maintain high data quality standards.
## Overview of the Incident Manager
The Incident Manager serves as a centralized hub to handle the resolution flow of failed Data Quality Tests. When a test fails, users can:
- **Acknowledge the Issue:** Recognize and confirm that there is a problem that needs attention.
- **Assign Responsibility:** Designate a specific person or team to address the errors.
- **Log the Root Cause:** Document the underlying cause of the failure for future reference and analysis.
## Using the Test Resolution Flow
The Test Resolution flow is a critical feature of the Incident Manager. Heres how it works:
1. **Failure Notification:** When a Data Quality Test fails, the system generates a notification.
2. **Acknowledge the Failure:** The designated user acknowledges the issue within the Incident Manager.
3. **Assignment:** The issue is then assigned to a knowledgeable user or team responsible for resolving it.
4. **Status Updates:** The assigned user can update the status of the issue, keeping the organization informed about progress and any developments.
5. **Sharing Updates:** All impacted users receive updates, ensuring everyone stays informed about the resolution process.
## Building a Troubleshooting Handbook
One of the powerful features of the Incident Manager is its ability to store all past failures. This historical data becomes a valuable troubleshooting handbook for your team. Here's how you can leverage it:
- **Explore Similar Scenarios:** Review previous incidents to understand how similar issues were resolved.
- **Contextual Information:** Access all necessary context directly within OpenMetadata, including previous resolutions, root causes, and responsible teams.
- **Continuous Improvement:** Use historical data to improve data quality tests and prevent future failures.
## Steps to Get Started
1. **Access the Incident Manager:** Navigate to the Incident Manager within the OpenMetadata platform.
2. **Monitor Data Quality Tests:** Keep an eye on your data quality tests to quickly identify any failures.
3. **Acknowledge and Assign:** Acknowledge any issues promptly and assign them to the appropriate team members.
4. **Log and Learn:** Document the root cause of each failure and use the stored information to learn and improve.
By following these steps, you'll ensure that your organization effectively manages data quality issues, maintains high standards, and continuously improves its data quality processes.
{%inlineCalloutContainer%}
{%inlineCallout
color="violet-70"
bold="How to work with Incident Manager"
icon="MdMenuBook"
href="/how-to-guides/data-observability/incident-manager/workflow"%}
Incident Manager Workflow
{%/inlineCallout%}
{%/inlineCalloutContainer%}

View File

@ -0,0 +1,19 @@
---
title: Data Observability
slug: /how-to-guides/data-observability
---
# Data Observability
OpenMetadata ensures the health and performance of your data systems by providing comprehensive data observability features. These features offer insights into the state of test cases, helping to detect, diagnose, and resolve data issues quickly. By monitoring data flows and data quality in real-time, data teams can ensure that data remains reliable and trustworthy. OpenMetadata supports [observability alerts and notifications](/how-to-guides/admin-guide/alerts) to help you maintain the integrity and performance of your data systems.
{%inlineCalloutContainer%}
{%inlineCallout
color="violet-70"
bold="Incident Manager"
icon="MdMenuBook"
href="/how-to-guides/data-observability/incident-manager"%}
Set up incident management in OpenMetadata.
{%/inlineCallout%}
{%/inlineCalloutContainer%}

View File

@ -48,6 +48,12 @@ OpenMetadata is a complete package for data teams to break down team silos, shar
link="/how-to-guides/data-governance"
icon="governance"
/%}
{% tile
title="Data Observability"
description="Ensure the health and performance of your data systems with OpenMetadata."
link="/how-to-guides/data-observability"
icon="observability"
/%}
{% /tilesContainer %}
## Quick Start Guides
@ -91,4 +97,6 @@ OpenMetadata is a complete package for data teams to break down team silos, shar
- Implement **[Data Governance](/how-to-guides/data-governance)** to maintain data integrity, security, and compliance.
- Implement **[Data Observability](/how-to-guides/data-observability)** to ensure the health and performance of your data systems.
{% /note %}

View File

@ -380,6 +380,13 @@ site_menu:
- category: How-to Guides / Data Governance / Classification / Best Practices for Classification
url: /how-to-guides/data-governance/classification/best-practices
- category: How-to Guides / Data Observability
url: /how-to-guides/data-observability
- category: How-to Guides / Data Observability / Incident Manager
url: /how-to-guides/data-observability/incident-manager
- category: How-to Guides / Data Observability / Incident Manager/ How to work with Incident Manager
url: /how-to-guides/data-observability/incident-manager/workflow
- category: Releases
url: /releases
color: violet-70

Binary file not shown.

After

Width:  |  Height:  |  Size: 302 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 483 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 457 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 326 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 374 KiB