2024-05-13 06:07:02 -07:00
---
description: This page provides an overview of working with DataHub Schema Assertions
---
2025-04-16 16:55:51 -07:00
import FeatureAvailability from '@site/src/components/FeatureAvailability ';
2024-05-13 06:07:02 -07:00
# Schema Assertions
< FeatureAvailability saasOnly / >
2025-04-28 23:34:33 +09:00
> The **Schema Assertions** feature is available as part of the **DataHub Cloud Observe** module of DataHub Cloud.
> If you are interested in learning more about **DataHub Cloud Observe** or trying it out, please [visit our website](https://datahub.com/products/data-observability/).
2024-05-13 06:07:02 -07:00
## Introduction
Can you remember a time when columns were unexpectedly added, removed, or altered for a key Table in your Data Warehouse?
2025-04-16 16:55:51 -07:00
Perhaps this caused downstream tables, views, dashboards, data pipelines, or AI models to break.
2024-05-13 06:07:02 -07:00
There are many reasons why the structure of an important Table on Snowflake, Redshift, or BigQuery may schema change, breaking the expectations
2025-04-16 16:55:51 -07:00
of downstream consumers of the table.
2024-05-13 06:07:02 -07:00
What if you could reduce the time to detect these incidents, so that the people responsible for the data were made aware of data
2024-07-30 09:52:51 +09:00
issues _before_ anyone else? With DataHub Cloud **Schema Assertions** , you can.
2024-05-13 06:07:02 -07:00
2024-07-30 09:52:51 +09:00
DataHub Cloud allows users to define expectations about a table's columns and their data types, and will monitor and validate these expectations over
2025-04-16 16:55:51 -07:00
time, notifying you when a breaking change occurs.
2024-05-13 06:07:02 -07:00
In this article, we'll cover the basics of monitoring Schema Assertions - what they are, how to configure them, and more - so that you and your team can
start building trust in your most important data assets.
Let's get started!
## Support
Schema Assertions are currently supported for all data sources that provide a schema via the normal ingestion process.
## What is a Schema Assertion?
2025-04-16 16:55:51 -07:00
A **Schema Assertion** is a Data Quality rule used to monitor the columns in a particular table and their data types.
They allow you to define a set of "required" columns for the table along with their expected types, and then be notified
if anything changes via a failing assertion.
2024-05-13 06:07:02 -07:00
This type of assertion can be particularly useful if you want to monitor the structure of a table which is outside of your
direct control, for example the result of an ETL process from an upstream application or tables provided by a 3rd party data vendor. It
allows you to get ahead of potentially breaking schema changes, by alerting you as soon as they occur, and before
2025-04-16 16:55:51 -07:00
they have a chance to negatively impact downstream assets.
2024-05-13 06:07:02 -07:00
### Anatomy of a Schema Assertion
At the most basic level, **Schema Assertions** consist of a few important parts:
1. A **Condition Type**
2. A set of **Expected Columns**
In this section, we'll give an overview of each.
#### 1. Condition Type
The **Condition Type** defines the conditions under which the Assertion will **fail** . More concretely, it determines
how the _expected_ columns should be compared to the _actual_ columns found in the schema to determine a passing or failing
2025-04-16 16:55:51 -07:00
state for the data quality check.
2024-05-13 06:07:02 -07:00
The list of supported condition types:
- **Contains**: The assertion will fail if the actual schema does not contain all expected columns and their types.
- **Exact Match**: The assertion will fail if the actual schema does not EXACTLY match the expected columns and their types. No
2025-04-16 16:55:51 -07:00
additional columns will be permitted.
2024-05-13 06:07:02 -07:00
Schema Assertions will be evaluated whenever a change in the schema of the underlying table is detected.
They also have an off switch: they can be started or stopped at any time by pressing the start (play) or stop (pause) buttons.
#### 2. Expected Columns
The **Expected Columns** are a set of column **names** along with their high-level **data
types** that should be used to compare against the _actual_ columns found in the table. By default, the expected column
set will be derived from the current set of columns found in the table. This conveniently allows you to "freeze" or "lock"
2025-04-16 16:55:51 -07:00
the current schema of a table in just a few clicks.
2024-05-13 06:07:02 -07:00
2025-04-16 16:55:51 -07:00
Each "expected column" is composed of a
2024-05-13 06:07:02 -07:00
1. **Name** : The name of the column that should be present in the table. Nested columns are supported in a flattened
2025-04-16 16:55:51 -07:00
fashion by simply providing a dot-separated path to the nested column. For example, `user.id` would be a nested column `id` .
2024-05-13 06:07:02 -07:00
In the case of a complex array or map, each field in the elements of the array or map will be treated as dot-delimited columns.
2025-04-16 16:55:51 -07:00
Note that verifying the specific type of object in primitive arrays or maps is not currently supported. Note that the comparison performed
is currently not case-sensitive.
2024-05-13 06:07:02 -07:00
2. **Type** : The high-level data type of the column in the table. This type intentionally "high level" to allow for normal column widening practices
without the risk of failing the assertion unnecessarily. For example a `varchar(64)` and a `varchar(256)` will both resolve to the same high-level
2025-04-16 16:55:51 -07:00
"STRING" type. The currently supported set of data types include the following:
- String
- Number
- Boolean
- Date
- Timestamp
- Struct
- Array
- Map
- Union
- Bytes
- Enum
2024-05-13 06:07:02 -07:00
## Creating a Schema Assertion
### Prerequisites
- **Permissions**: To create or delete Schema Assertions for a specific entity on DataHub, you'll need to be granted the
2025-04-16 16:55:51 -07:00
`Edit Assertions` , `Edit Monitors` privileges for the entity. This will be granted to Entity owners as part of the `Asset Owners - Metadata Policy`
by default.
2024-05-13 06:07:02 -07:00
Once these are in place, you're ready to create your Schema Assertions!
### Steps
1. Navigate to the Table you want to monitor
2. Click the **Validations** tab
< p align = "left" >
< img width = "80%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/freshness/profile-validation-tab.png" / >
< / p >
3. Click ** + Create Assertion**
< p align = "left" >
< img width = "45%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/schema/assertion-builder-choose-type.png" / >
< / p >
4. Choose **Schema**
2025-04-16 16:55:51 -07:00
5. Select the **condition type** .
2024-05-13 06:07:02 -07:00
6. Define the **expected columns** that will be continually compared against the actual column set. This defaults to the current columns for the table.
< p align = "left" >
< img width = "40%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/schema/assertion-builder-config.png" / >
< / p >
7. Configure actions that should be taken when the assertion passes or fails
< p align = "left" >
< img width = "40%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/shared/assertion-builder-actions.png" / >
< / p >
- **Raise incident**: Automatically raise a new DataHub Incident for the Table whenever the Custom SQL Assertion is failing. This
may indicate that the Table is unfit for consumption. Configure Slack Notifications under **Settings** to be notified when
an incident is created due to an Assertion failure.
- **Resolve incident**: Automatically resolved any incidents that were raised due to failures in this Custom SQL Assertion. Note that
any other incidents will not be impacted.
2025-04-16 16:55:51 -07:00
Then click **Next** .
2024-05-13 06:07:02 -07:00
7. (Optional) Add a **description** for the assertion. This is a human-readable description of the assertion. If you do not provide one, a description will be generated for you.
< p align = "left" >
< img width = "40%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/shared/assertion-builder-description.png" / >
< / p >
8. Click **Save** .
And that's it! DataHub will now begin to monitor your Schema Assertion for the table.
2025-04-16 16:55:51 -07:00
Once your assertion has run, you will begin to see Success or Failure status:
2024-05-13 06:07:02 -07:00
< p align = "left" >
< img width = "45%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/schema/assertion-results.png" / >
< / p >
## Stopping a Schema Assertion
In order to temporarily stop the evaluation of the assertion:
1. Navigate to the **Validations** tab of the Table with the assertion
2. Click **Schema** to open the Schema Assertion
2025-04-16 16:55:51 -07:00
3. Click the "Stop" button.
2024-05-13 06:07:02 -07:00
< p align = "left" >
< img width = "25%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/shared/stop-assertion.png" / >
< / p >
To resume the assertion, simply click **Start** .
< p align = "left" >
< img width = "25%" src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/observe/shared/start-assertion.png" / >
< / p >
## Creating Schema Assertions via API
Note that to create or delete Assertions and Monitors for a specific entity on DataHub, you'll need the
`Edit Assertions` and `Edit Monitors` privileges to create schema assertion via API.
#### GraphQL
In order to create a Schema Assertions, you can use the `upsertDatasetSchemaAssertionMonitor` mutation.
##### Examples
To create a Schema Assertion that checks for a the presence of a specific set of columns:
```graphql
mutation upsertDatasetSchemaAssertionMonitor {
upsertDatasetSchemaAssertionMonitor(
2025-04-16 16:55:51 -07:00
input: {
entityUrn: "< urn of the table to be monitored > "
assertion: {
compatibility: SUPERSET # How the actual columns will be compared against the expected fields (provided next)
fields: [
{ path: "id", type: STRING }
{ path: "count", type: NUMBER }
{ path: "struct", type: STRUCT }
{ path: "struct.nestedBooleanField", type: BOOLEAN }
]
2024-05-13 06:07:02 -07:00
}
2025-04-16 16:55:51 -07:00
description: "< description of the schema assertion > "
mode: ACTIVE
}
2024-05-13 06:07:02 -07:00
)
}
```
2025-04-16 16:55:51 -07:00
The supported compatibility types are `EXACT_MATCH` and `SUPERSET` (Contains).
2024-05-13 06:07:02 -07:00
2025-04-16 16:55:51 -07:00
You can use same endpoint with assertion urn input to update an existing Schema Assertion, simply add the `assertionUrn` field:
2024-05-13 06:07:02 -07:00
```graphql
mutation upsertDatasetSchemaAssertionMonitor {
2025-04-16 16:55:51 -07:00
upsertDatasetSchemaAssertionMonitor(
assertionUrn: "urn:li:assertion:existing-assertion-id"
input: {
entityUrn: "< urn of the table to be monitored > "
assertion: {
compatibility: EXACT_MATCH
fields: [
{ path: "id", type: STRING }
{ path: "count", type: NUMBER }
{ path: "struct", type: STRUCT }
{ path: "struct.nestedBooleanField", type: BOOLEAN }
]
}
description: "< description of the schema assertion > "
mode: ACTIVE
}
)
2024-05-13 06:07:02 -07:00
}
```
You can delete assertions along with their monitors using GraphQL mutations: `deleteAssertion` and `deleteMonitor` .
### Tips
:::info
**Authorization**
Remember to always provide a DataHub Personal Access Token when calling the GraphQL API. To do so, just add the 'Authorization' header as follows:
```
Authorization: Bearer < personal-access-token >
```
**Exploring GraphQL API**
2025-04-28 23:34:33 +09:00
Also, remember that you can play with an interactive version of the DataHub Cloud GraphQL API at `https://your-account-id.acryl.io/api/graphiql`
2024-05-13 06:07:02 -07:00
:::