The assertion entity represents a data quality rule that can be applied to one or more datasets. Assertions are the foundation of DataHub's data quality framework, enabling organizations to define, monitor, and enforce expectations about their data. They encompass various types of checks including field-level validation, volume monitoring, freshness tracking, schema validation, and custom SQL-based rules.
Assertions can originate from multiple sources: they can be defined natively within DataHub, ingested from external data quality tools (such as Great Expectations, dbt tests, or Snowflake Data Quality), or inferred by ML-based systems. Each assertion tracks its evaluation history over time, maintaining a complete audit trail of passes, failures, and errors.
An **Assertion** is uniquely identified by an `assertionId`, which is a globally unique identifier that remains constant across runs of the assertion. The URN format is:
```
urn:li:assertion:<assertionId>
```
The `assertionId` is typically a generated GUID that uniquely identifies the assertion definition. For example:
```
urn:li:assertion:432475190cc846f2894b5b3aa4d55af2
```
### Generating Stable Assertion IDs
The logic for generating stable assertion IDs differs based on the source of the assertion:
- **Native Assertions**: Created in DataHub Cloud's UI or API, the platform generates a UUID
- **External Assertions**: Each integration tool generates IDs based on its own conventions:
- **Great Expectations**: Combines expectation suite name, expectation type, and parameters
- **dbt Tests**: Uses the test's unique_id from the manifest
- **Snowflake Data Quality**: Uses the native DMF rule ID
- **Inferred Assertions**: ML-based systems generate IDs based on the inference model and target
The key requirement is that the same assertion definition should always produce the same `assertionId`, enabling DataHub to track the assertion's history over time even as it's re-evaluated.
SQL assertions allow custom validation logic using arbitrary SQL queries. Two types:
- **METRIC**: Execute SQL and assert the returned metric meets expectations
- **METRIC_CHANGE**: Assert the change in a SQL metric over time
SQL assertions provide maximum flexibility for complex validation scenarios that don't fit other assertion types, such as cross-table referential integrity checks or business rule validation.
<details>
<summary>Python SDK: Create a SQL metric assertion</summary>
**Identity**: `IDENTITY` (no aggregation), `COLUMNS` (all columns)
## Integration Points
### Relationship to Datasets
Assertions have a strong relationship with datasets through the `Asserts` relationship:
- Field assertions target specific dataset columns
- Volume assertions monitor dataset row counts
- Freshness assertions track dataset update times
- Schema assertions validate dataset structure
- SQL assertions query dataset contents
Datasets maintain a reverse relationship, showing all assertions that validate them. This enables users to understand the quality checks applied to any dataset.
### Relationship to Data Jobs
Freshness assertions can target data jobs (pipelines) to ensure they execute on schedule. When a `FreshnessAssertionInfo` has `type=DATA_JOB_RUN`, the `entity` field references a dataJob URN rather than a dataset.
### Relationship to Data Platforms
External assertions maintain a relationship to their source platform through the `dataPlatformInstance` aspect. This enables:
- Filtering assertions by source tool
- Deep-linking back to the source platform
- Understanding the assertion's external context
### GraphQL API
Assertions are fully accessible via DataHub's GraphQL API:
- Query assertions and their run history
- Create and update native assertions
- Delete assertions
- Retrieve assertions for a specific dataset
Key GraphQL types:
-`Assertion`: The main assertion entity
-`AssertionInfo`: Assertion definition and type
-`AssertionRunEvent`: Evaluation results
-`AssertionSource`: Origin metadata
### Integration with dbt
DataHub's dbt integration automatically converts dbt tests into assertions:
- **Schema Tests**: Mapped to field assertions (not_null, unique, accepted_values, relationships)
- **Data Tests**: Mapped to SQL assertions
- **Test Metadata**: Test severity, tags, and descriptions are preserved
### Integration with Great Expectations
The Great Expectations integration maps expectations to assertion types:
- Column expectations → Field assertions
- Table expectations → Volume or schema assertions
- Custom expectations → Custom assertions
Each expectation suite becomes a collection of assertions in DataHub.
### Integration with Snowflake Data Quality
Snowflake DMF (Data Metric Functions) rules are ingested as assertions:
- Row count rules → Volume assertions
- Uniqueness rules → Field metric assertions
- Freshness rules → Freshness assertions
- Custom metric rules → SQL assertions
## Notable Exceptions
### Legacy Dataset Assertion Type
The `DATASET` assertion type is a legacy format that predates the more specific field, volume, freshness, and schema assertion types. It uses `DatasetAssertionInfo` with a generic structure. New integrations should use the more specific assertion types (FIELD, VOLUME, FRESHNESS, DATA_SCHEMA, SQL) as they provide better type safety and UI rendering.
### Assertion Results vs. Assertion Metrics
While assertions track pass/fail status, DataHub also supports more detailed metrics through the `AssertionResult` object:
-`actualAggValue`: The actual value observed (for numeric assertions)
-`externalUrl`: Link to detailed results in the source system
-`nativeResults`: Platform-specific result details
This enables richer debugging and understanding of why assertions fail.
### Assertion Scheduling
DataHub tracks when assertions run through `assertionRunEvent` timeseries data, but does not directly schedule assertion evaluations. Scheduling is handled by:
- **External Assertions**: The source platform's scheduler (dbt, Airflow, etc.)
- **On-Demand**: Manual or API-triggered evaluations
DataHub provides monitoring and alerting based on the assertion run events, regardless of the scheduling mechanism.
### Assertion vs. Test Results
DataHub has two related concepts:
- **Assertions**: First-class entities that define data quality rules
- **Test Results**: A simpler aspect that can be attached to datasets
Test results are lightweight pass/fail indicators without the full expressiveness of assertions. Use assertions for production data quality monitoring and test results for simple ingestion-time validation.