feat(docs): 3.13 Observe docs (#14265)

Co-authored-by: John Joyce <john@acryl.io>
This commit is contained in:
Jay 2025-07-31 09:45:24 -04:00 committed by GitHub
parent e7ae66c50b
commit 66f36c273d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 83 additions and 12 deletions

View File

@ -48,6 +48,11 @@ module.exports = {
type: "category",
link: { type: "doc", id: "docs/managed-datahub/observe/assertions" },
items: [
{
label: "Overview",
type: "doc",
id: "docs/managed-datahub/observe/assertions",
},
{
label: "Column Assertions",
type: "doc",
@ -90,6 +95,12 @@ module.exports = {
id: "docs/managed-datahub/observe/data-health-dashboard",
className: "saasOnly",
},
{
label: "Assertion Notes (Troubleshooting & Documentation)",
type: "doc",
id: "docs/managed-datahub/observe/assertion-notes",
className: "saasOnly",
},
{
label: "Open Assertions Specification",
type: "category",

View File

@ -0,0 +1,33 @@
---
description: This page provides an overview of using Assertion Notes
---
import FeatureAvailability from '@site/src/components/FeatureAvailability';
# Assertion Notes
<FeatureAvailability saasOnly />
> The **Assertion Notes** feature is available as part of the **DataHub Cloud Observe** module of DataHub Cloud.
> If you are interested in learning more about **DataHub Cloud Observe** or trying it out, please [visit our website](https://datahub.com/products/data-observability/).
## Introduction
The Assertion notes feature aims to solve two key use cases:
1. Surfacing useful tips for engineers to troubleshoot and resolve data quality failures
2. Documenting the purpose of a given check, and implications of its failiure; for instance, some checks may circuit-break pipelines.
### For Troubleshooting
As you scale your data quality coverage across a large data landscape, you will often find that the engineers who are troubleshooting and resolving an assertion failure are not the same people who created the check.
Oftentimes, it's useful to provide troubleshooting instructions or notes with context about how to resolve the problem when a check fails.
- If the check was manually set up, it may be worthwhile for the creator to add notes for future on-call engineers
- If it was an AI check, whoever is first to investigate the failure may want to document what they did to fix it.
### For Documenting
Adding notes to Assertions is useful for documenting your Assertions. This is particularly relevant for Custom SQL checks, where understanding the logic from the query statements can be difficult. By adding documentation in the notes tab, others can understand exactly what is being monitored and how to resolve issues in event of failure.
<iframe width="516" height="342" src="https://www.loom.com/embed/a6cb07d33e8440acafacea381912f904?sid=32918cd5-9ebf-4aa0-90bc-37fae84d1841" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>

View File

@ -1,7 +1,9 @@
# Assertions
:::note Contract Monitoring Support
Currently we support Snowflake, Redshift, BigQuery, and Databricks for out-of-the-box contract monitoring as part of DataHub Cloud Observe.
:::note Supported Data Platforms
Currently we support monitoring data on Snowflake, Redshift, BigQuery, and Databricks as part of DataHub Cloud Observe.
For other data platforms, DataHub Cloud Observe can monitor assertions against dataset metrics (such as volume, or column nullness) and dataset freshenss by using the ingested statistics for each asset.
Column Value and Custom SQL Assertions are not currently supported for other data platforms.
:::
An assertion is **a data quality test that finds data that violates a specified rule.**
@ -11,7 +13,7 @@ Assertions serve as the building blocks of [Data Contracts](/docs/managed-datahu
Data quality tests (a.k.a. assertions) can be created and run by DataHub Cloud or ingested from a 3rd party tool.
### DataHub Cloud Observe
### DataHub Cloud Assertions
For DataHub-provided assertion runners, we can deploy an agent in your environment to hit your sources and DataHub. DataHub Cloud Observe offers out-of-the-box evaluation of the following kinds of assertions:
@ -19,8 +21,17 @@ For DataHub-provided assertion runners, we can deploy an agent in your environme
- [Volume](/docs/managed-datahub/observe/volume-assertions.md)
- [Custom SQL](/docs/managed-datahub/observe/custom-sql-assertions.md)
- [Column](/docs/managed-datahub/observe/column-assertions.md)
- [Schema](/docs/managed-datahub/observe/schema-assertions.md)
### Anomaly detection
#### Bulk Creating Assertions
You can bulk create Freshness and Volume [Smart Assertions](/docs/managed-datahub/observe/smart-assertions.md) (AI Anomaly Monitors) across several tables at once via the [Data Health Dashboard](/docs/managed-datahub/observe/data-health-dashboard.md):
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>
To bulk create column metric assertions on a given dataset, follow the steps under the **Anomaly Detection** section of [Column Assertion](https://docs.datahub.com/docs/managed-datahub/observe/column-assertions#anomaly-detection-with-smart-assertions-).
### AI Anomaly Detection (Smart Assertions)
There are many cases where either you do not have the time to figure out what a good rule for an assertion is, or strict rules simply do not suffice for your data validation needs. Traditional rule-based assertions can become inadequate when dealing with complex data patterns or large-scale operations.
@ -28,13 +39,13 @@ There are many cases where either you do not have the time to figure out what a
Here are some typical situations where manual assertion rules fall short:
**Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
- **Seasonal data patterns** - A table whose row count changes exhibit weekly seasonality may need a different set of assertions for each day of the week, making static rules impractical to maintain.
**Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
- **Statistical complexity across large datasets** - Figuring out what the expected standard deviation is for each column can be incredibly time consuming and not feasible across hundreds of tables, especially when each table has unique characteristics.
**Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
- **Dynamic data environments** - When data patterns evolve over time, manually updating assertion rules becomes a maintenance burden that can lead to false positives or missed anomalies.
## The Smart Assertion Solution
### The Smart Assertion Solution
In these scenarios, you may want to consider creating a [Smart Assertion](./smart-assertions.md) to let machine learning automatically detect the normal patterns in your data and alert you when anomalies occur. This approach allows for more flexible and adaptive data quality monitoring without the overhead of manual rule maintenance.
@ -77,8 +88,8 @@ There are a few ways DataHub Cloud assertions can be executed:
a. `Information Schema` tables are used by default to power cheap, fast checks on a table's freshness or row count.
b. `Audit log` or `Operation log` tables can be used to granularly monitor table operations.
c. The table itself can also be queried directly. This is useful for freshness checks referencing `last_updated` columns, row count checks targetting a subset of the data, and column value checks. We offer several optimizations to reduce query costs for these checks.
2. Reference DataHub profiling information
a. `Operation`s that are reported via ingestion or our SDKs can power monitoring table freshness.
2. Reference DataHub metadata
a. [Operations](/docs/api/tutorials/operations.md) that are reported via ingestion or our SDKs can power monitoring table freshness.
b. `DatasetProfile` and `SchemaFieldProfile` ingested or reported via SDKs can power monitoring table metrics and column metrics.
### Privacy: Execute In-Network, avoid exposing data externally

View File

@ -237,7 +237,11 @@ You can create smart assertions by simply selecting the column and the metric yo
<img width="40%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/column/column-smart-assertion.png"/>
</p>
_Coming soon: we're making it easier to create Smart Assertions for multiple fields on a table, across multiple metrics, all in one go. If you're interested in this today, please let your DataHub representative know._
**Bulk Creating for Multiple Columns**
To select several columns on a table to monitor at once, you can use the **Bulk-Create Smart Assertions** button below the column selector in the Column Metric Assertion authoring UI.
<iframe width="560" height="343" src="https://www.loom.com/embed/e71598c4394c4d8dba0770b8fc67ff06?sid=25326338-8a72-4382-98b5-026486233ef9" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
## Stopping a Column Assertion

View File

@ -67,4 +67,8 @@ In addition, both the `By Tables` tab and the `Incidents` tab will apply your gl
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/data-health/view-applied.png"/>
</p>
**Coming soon:** in the upcoming releases we will be including the filters in the url parameters. This will make it incredibly easy for you to bookmark your specifi c
## Bulk Create Smart Assertions
[Smart Assertions](./smart-assertions.md) are AI Anomaly Checks that can be used to quickly 'strap a seatbelt' across your data landscape. You can hit the 'Bulk Create' button in the top right corner of the data health dashboard to quickly set up anomaly detection across your most important assets:
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>

View File

@ -145,6 +145,8 @@ Freshness Assertions also have an off switch: they can be started or stopped at
Once these are in place, you're ready to create your Freshness Assertions!
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
### Steps
1. Navigate to the Table that to monitor for freshness

View File

@ -24,6 +24,10 @@ Today, you can create Smart Assertions for 3 types of assertions. To learn more
2. [Freshness](./freshness-assertions.md#anomaly-detection-with-smart-assertions-)
3. [Column Metrics](./column-assertions.md#anomaly-detection-with-smart-assertions-)
You can also create Freshness & Volume Smart Assertions in bulk on the [Data Health page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions):
<div align="center"><iframe width="560" height="315" src="https://www.loom.com/embed/f6720541914645aab6b28cdff8695d9f?sid=58dff84d-bb88-4f02-b814-17fb4986ad1f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></div>
## Improving Smart assertion quality
You can improve predictions through two key levers:

View File

@ -139,6 +139,8 @@ Volume Assertions also have an off switch: they can be started or stopped at any
Once these are in place, you're ready to create your Volume Assertions!
You can also **Bulk Create Smart Assertions** via the [Data Health Page](https://docs.datahub.com/docs/managed-datahub/observe/data-health-dashboard#bulk-create-smart-assertions)
### Steps
1. Navigate to the Table that to monitor for volume