mirror of https://github.com/open-metadata/OpenMetadata.git synced 2025-07-07 00:58:46 +00:00

Docs: Prepare 1.8 docs and 1.9 Snapshot (#21938 )

2025-06-25 13:12:09 +05:30

2.8 KiB

Raw Permalink Blame History

title	slug
Set Up Anomaly Detection in Collate for Data Quality	/how-to-guides/data-quality-observability/anomaly-detection/setting-up

Steps to Set Up Anomaly Detection

1. Create a Test from the UI

First, select the dataset and navigate to the Tests section in the Collate UI.
Define your test parameters. You can either create a static test (e.g., "no null values" or "data should not exceed a certain range") or configure dynamic assertions to let the system learn from the data.

{% image src="/images/v1.8/how-to-guides/anomaly-detection/set-up-anomaly-detection-1.png" alt="Manual Configuration of Tests" caption="Manual Configuration of Tests" /%}

{% image src="/images/v1.8/how-to-guides/anomaly-detection/set-up-anomaly-detection-2.png" alt="Manual Configuration of Tests" caption="Manual Configuration of Tests" /%}

2. Configure Manual Tests

For more controlled monitoring, set up manual thresholds (e.g., sales should not exceed a maximum value of 100). This provides specific control over data validation criteria.

3. Enable Dynamic Assertions

For data that naturally fluctuates or evolves, enable dynamic assertions. Collate will start profiling your data regularly to learn its normal behavior.
Over time (e.g., five weeks), the system will establish expected value ranges and detect any deviations from these patterns.

{% image src="/images/v1.8/how-to-guides/anomaly-detection/set-up-anomaly-detection-3.png" alt="Manual Configuration of Tests" caption="Manual Configuration of Tests" /%}

4. Monitor Incidents

After configuring tests, monitor for any incidents triggered by anomalies detected in the system.
Investigate significant spikes, drops, or unusual behaviors in the data, which may indicate system errors, backend failures, or unexpected external factors.

{% image src="/images/v1.8/how-to-guides/anomaly-detection/set-up-anomaly-detection-4.png" alt="Manual Configuration of Tests" caption="Manual Configuration of Tests" /%}

Best Practices

Use Static Assertions for Simple Rules: For basic data validation, such as preventing null values or enforcing a minimum threshold, static assertions are effective and straightforward to configure.
Leverage Dynamic Assertions for Evolving Data: When dealing with datasets that naturally fluctuate (e.g., sales or user activity), dynamic assertions can save time and ensure incidents are only triggered when significant anomalies occur.
Regularly Review Incidents: Stay on top of incidents generated by anomaly detection to promptly identify and address data quality issues.
Combine Manual and Dynamic Methods: For datasets with well-defined boundaries and evolving characteristics, combining manual thresholds and dynamic assertions provides comprehensive anomaly detection coverage.

2.8 KiB Raw Permalink Blame History