Docs: Anomaly Detection Updation (#18484)

This commit is contained in:
RounakDhillon 2024-11-12 17:23:01 +05:30 committed by GitHub
parent ce43c975af
commit 75ccb1adc3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 260 additions and 0 deletions

View File

@ -671,6 +671,12 @@ site_menu:
- category: How-to Guides / Data Quality and Observability / Incident Manager / Root Cause Analysis
url: /how-to-guides/data-quality-observability/incident-manager/root-cause-analysis
isCollateOnly: true
- category: How-to Guides / Data Quality and Observability / Anomaly Detection
url: /how-to-guides/data-quality-observability/anomaly-detection
isCollateOnly: true
- category: How-to Guides / Data Quality and Observability / Anomaly Detection / Steps to Set Up Anomaly Detection
url: /how-to-guides/data-quality-observability/anomaly-detection/setting-up
isCollateOnly: true
- category: How-to Guides / Data Lineage
url: /how-to-guides/data-lineage

View File

@ -0,0 +1,72 @@
---
title: Anomaly Detection in Collate | Automated Data Quality Alerts
slug: /how-to-guides/data-quality-observability/anomaly-detection
---
# Overview
The **Anomaly Detection** feature in Collate helps ensure data quality by automatically detecting unexpected changes, such as spikes or drops in data trends. Instead of requiring users to manually define rigid boundaries for data validation, Collate dynamically learns from your data patterns through regular profiling. This allows for more accurate and flexible anomaly detection, alerting you only when there are significant deviations that might indicate underlying issues.
## Key Benefits of Anomaly Detection
- **Automated Detection of Unexpected Data Changes**: Collate can detect unexpected data behaviors, such as spikes or drops, that deviate from normal trends. This is crucial for identifying potential issues with data pipelines, backend systems, or infrastructure.
- **Dynamic Learning**: The system continuously profiles your data over time, learning its natural variations, including seasonal fluctuations. For example, if sales data varies throughout the year due to holidays, Collates dynamic assertions can detect this seasonality and prevent unnecessary error alerts. This allows the system to automatically adjust to your datas evolving patterns without requiring manual configuration.
- **Flexible Configuration**: For more controlled scenarios, users can still manually define specific boundaries or thresholds to monitor data, such as ensuring values stay within a certain range. This offers both manual and automatic methods for managing data quality.
## Use Cases
### 1. Static Assertions for Simple Tests
- **Problem**: In many cases, users want to perform straightforward data tests, such as ensuring that values are not null or that there are no repeated values.
- **Solution**: Collate enables users to configure simple assertions directly from the UI. For example, users can create tests to ensure:
- Data should not be null.
- There should be no duplicate values.
- Data should not be older than a specific time frame (e.g., one day).
- Values should be greater than zero.
- **Example**: If you want to ensure that your sales data contains no null values or duplicates, you can easily configure these assertions via the UI.
### 2. Dynamic Assertions for Evolving Data
- **Problem**: Some data, such as sales figures, naturally evolves over time. For example, sales data might fluctuate daily or weekly, and manual bounds may not accurately capture these variations.
- **Solution**: Collate uses **dynamic assertions**, which automatically learn from the data by profiling it regularly. Over time, the system establishes a pattern for how the data behaves, allowing it to detect when values significantly deviate from this expected behavior.
- **Example**: If sales suddenly spike or drop beyond what is typical for your historical data, Collate will alert you to this anomaly.
## How Anomaly Detection Works
### 1. Manual Configuration of Tests
Users can manually configure tests for specific data points if they want to maintain tight control over their data quality checks. For instance, you can specify that a value must stay between 10 and 100. This method is useful for data that has well-understood constraints or when precise validation rules are required.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-2.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 2. Dynamic Assertions
For more complex or evolving datasets, Collate offers **dynamic assertions**. These assertions automatically adapt to your data by learning its natural patterns over time. The profiling process typically takes around five weeks, during which the system builds an understanding of normal data fluctuations.
- **Data Profiling**: Collate continuously scans the data and trains its models based on the profiled data. Once this learning phase is complete, the system can detect significant deviations from expected patterns, alerting users to anomalies.
- **Advantages of Dynamic Assertions**:
- **Adaptability**: No need to set manual thresholds for evolving datasets.
- **Efficiency**: Focus on genuine anomalies instead of managing static tests that may quickly become outdated as data evolves.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-3.png"
alt="Dynamic Assertions"
caption="Dynamic Assertions"
/%}
### 3. Incidents and Notifications
When an anomaly is detected, Collate automatically generates incidents, including for rule-based test cases. These notifications help users quickly understand when and where their data may be behaving unexpectedly.
- **Example**: If sales data suddenly shows an abnormal spike or drop, Collate will notify you, allowing you to investigate potential causes such as system malfunctions or external influences.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-4.png"
alt="Incidents and Notifications"
caption="Incidents and Notifications"
/%}

View File

@ -0,0 +1,52 @@
---
title: Set Up Anomaly Detection in Collate for Data Quality
slug: /how-to-guides/data-quality-observability/anomaly-detection/setting-up
---
# Steps to Set Up Anomaly Detection
### 1. Create a Test from the UI
- First, select the dataset and navigate to the **Tests** section in the Collate UI.
- Define your test parameters. You can either create a **static test** (e.g., "no null values" or "data should not exceed a certain range") or configure **dynamic assertions** to let the system learn from the data.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-1.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-2.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 2. Configure Manual Tests
- For more controlled monitoring, set up **manual thresholds** (e.g., sales should not exceed a maximum value of 100). This provides specific control over data validation criteria.
### 3. Enable Dynamic Assertions
- For data that naturally fluctuates or evolves, enable **dynamic assertions**. Collate will start profiling your data regularly to learn its normal behavior.
- Over time (e.g., five weeks), the system will establish expected value ranges and detect any deviations from these patterns.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-3.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 4. Monitor Incidents
- After configuring tests, monitor for any **incidents** triggered by anomalies detected in the system.
- Investigate significant spikes, drops, or unusual behaviors in the data, which may indicate system errors, backend failures, or unexpected external factors.
{% image
src="/images/v1.5/how-to-guides/anomaly-detection/set-up-anomaly-detection-4.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
## Best Practices
- **Use Static Assertions for Simple Rules**: For basic data validation, such as preventing null values or enforcing a minimum threshold, static assertions are effective and straightforward to configure.
- **Leverage Dynamic Assertions for Evolving Data**: When dealing with datasets that naturally fluctuate (e.g., sales or user activity), dynamic assertions can save time and ensure incidents are only triggered when significant anomalies occur.
- **Regularly Review Incidents**: Stay on top of incidents generated by anomaly detection to promptly identify and address data quality issues.
- **Combine Manual and Dynamic Methods**: For datasets with well-defined boundaries and evolving characteristics, combining manual thresholds and dynamic assertions provides comprehensive anomaly detection coverage.

View File

@ -698,6 +698,12 @@ site_menu:
- category: How-to Guides / Data Quality and Observability / Incident Manager / Root Cause Analysis
url: /how-to-guides/data-quality-observability/incident-manager/root-cause-analysis
isCollateOnly: true
- category: How-to Guides / Data Quality and Observability / Anomaly Detection
url: /how-to-guides/data-quality-observability/anomaly-detection
isCollateOnly: true
- category: How-to Guides / Data Quality and Observability / Anomaly Detection / Steps to Set Up Anomaly Detection
url: /how-to-guides/data-quality-observability/anomaly-detection/setting-up
isCollateOnly: true
- category: How-to Guides / Data Lineage
url: /how-to-guides/data-lineage

View File

@ -0,0 +1,72 @@
---
title: Anomaly Detection in Collate | Automated Data Quality Alerts
slug: /how-to-guides/data-quality-observability/anomaly-detection
---
# Overview
The **Anomaly Detection** feature in Collate helps ensure data quality by automatically detecting unexpected changes, such as spikes or drops in data trends. Instead of requiring users to manually define rigid boundaries for data validation, Collate dynamically learns from your data patterns through regular profiling. This allows for more accurate and flexible anomaly detection, alerting you only when there are significant deviations that might indicate underlying issues.
## Key Benefits of Anomaly Detection
- **Automated Detection of Unexpected Data Changes**: Collate can detect unexpected data behaviors, such as spikes or drops, that deviate from normal trends. This is crucial for identifying potential issues with data pipelines, backend systems, or infrastructure.
- **Dynamic Learning**: The system continuously profiles your data over time, learning its natural variations, including seasonal fluctuations. For example, if sales data varies throughout the year due to holidays, Collates dynamic assertions can detect this seasonality and prevent unnecessary error alerts. This allows the system to automatically adjust to your datas evolving patterns without requiring manual configuration.
- **Flexible Configuration**: For more controlled scenarios, users can still manually define specific boundaries or thresholds to monitor data, such as ensuring values stay within a certain range. This offers both manual and automatic methods for managing data quality.
## Use Cases
### 1. Static Assertions for Simple Tests
- **Problem**: In many cases, users want to perform straightforward data tests, such as ensuring that values are not null or that there are no repeated values.
- **Solution**: Collate enables users to configure simple assertions directly from the UI. For example, users can create tests to ensure:
- Data should not be null.
- There should be no duplicate values.
- Data should not be older than a specific time frame (e.g., one day).
- Values should be greater than zero.
- **Example**: If you want to ensure that your sales data contains no null values or duplicates, you can easily configure these assertions via the UI.
### 2. Dynamic Assertions for Evolving Data
- **Problem**: Some data, such as sales figures, naturally evolves over time. For example, sales data might fluctuate daily or weekly, and manual bounds may not accurately capture these variations.
- **Solution**: Collate uses **dynamic assertions**, which automatically learn from the data by profiling it regularly. Over time, the system establishes a pattern for how the data behaves, allowing it to detect when values significantly deviate from this expected behavior.
- **Example**: If sales suddenly spike or drop beyond what is typical for your historical data, Collate will alert you to this anomaly.
## How Anomaly Detection Works
### 1. Manual Configuration of Tests
Users can manually configure tests for specific data points if they want to maintain tight control over their data quality checks. For instance, you can specify that a value must stay between 10 and 100. This method is useful for data that has well-understood constraints or when precise validation rules are required.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-2.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 2. Dynamic Assertions
For more complex or evolving datasets, Collate offers **dynamic assertions**. These assertions automatically adapt to your data by learning its natural patterns over time. The profiling process typically takes around five weeks, during which the system builds an understanding of normal data fluctuations.
- **Data Profiling**: Collate continuously scans the data and trains its models based on the profiled data. Once this learning phase is complete, the system can detect significant deviations from expected patterns, alerting users to anomalies.
- **Advantages of Dynamic Assertions**:
- **Adaptability**: No need to set manual thresholds for evolving datasets.
- **Efficiency**: Focus on genuine anomalies instead of managing static tests that may quickly become outdated as data evolves.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-3.png"
alt="Dynamic Assertions"
caption="Dynamic Assertions"
/%}
### 3. Incidents and Notifications
When an anomaly is detected, Collate automatically generates incidents, including for rule-based test cases. These notifications help users quickly understand when and where their data may be behaving unexpectedly.
- **Example**: If sales data suddenly shows an abnormal spike or drop, Collate will notify you, allowing you to investigate potential causes such as system malfunctions or external influences.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-4.png"
alt="Incidents and Notifications"
caption="Incidents and Notifications"
/%}

View File

@ -0,0 +1,52 @@
---
title: Set Up Anomaly Detection in Collate for Data Quality
slug: /how-to-guides/data-quality-observability/anomaly-detection/setting-up
---
# Steps to Set Up Anomaly Detection
### 1. Create a Test from the UI
- First, select the dataset and navigate to the **Tests** section in the Collate UI.
- Define your test parameters. You can either create a **static test** (e.g., "no null values" or "data should not exceed a certain range") or configure **dynamic assertions** to let the system learn from the data.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-1.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-2.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 2. Configure Manual Tests
- For more controlled monitoring, set up **manual thresholds** (e.g., sales should not exceed a maximum value of 100). This provides specific control over data validation criteria.
### 3. Enable Dynamic Assertions
- For data that naturally fluctuates or evolves, enable **dynamic assertions**. Collate will start profiling your data regularly to learn its normal behavior.
- Over time (e.g., five weeks), the system will establish expected value ranges and detect any deviations from these patterns.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-3.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
### 4. Monitor Incidents
- After configuring tests, monitor for any **incidents** triggered by anomalies detected in the system.
- Investigate significant spikes, drops, or unusual behaviors in the data, which may indicate system errors, backend failures, or unexpected external factors.
{% image
src="/images/v1.6/how-to-guides/anomaly-detection/set-up-anomaly-detection-4.png"
alt="Manual Configuration of Tests"
caption="Manual Configuration of Tests"
/%}
## Best Practices
- **Use Static Assertions for Simple Rules**: For basic data validation, such as preventing null values or enforcing a minimum threshold, static assertions are effective and straightforward to configure.
- **Leverage Dynamic Assertions for Evolving Data**: When dealing with datasets that naturally fluctuate (e.g., sales or user activity), dynamic assertions can save time and ensure incidents are only triggered when significant anomalies occur.
- **Regularly Review Incidents**: Stay on top of incidents generated by anomaly detection to promptly identify and address data quality issues.
- **Combine Manual and Dynamic Methods**: For datasets with well-defined boundaries and evolving characteristics, combining manual thresholds and dynamic assertions provides comprehensive anomaly detection coverage.

Binary file not shown.

After

Width:  |  Height:  |  Size: 455 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 585 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 565 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 525 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 455 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 585 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 565 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 525 KiB