mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-27 18:07:57 +00:00
docs(): Updating docs for assertions to correct databricks assertions support (#9713)
Co-authored-by: John Joyce <john@Johns-MBP.attlocal.net>
This commit is contained in:
parent
a78c6899a2
commit
caf6ebe3b7
@ -117,7 +117,7 @@ The **Assertion Description**: This is a human-readable description of the Asser
|
||||
### Prerequisites
|
||||
|
||||
1. **Permissions**: To create or delete Custom SQL Assertions for a specific entity on DataHub, you'll need to be granted the
|
||||
`Edit Assertions` and `Edit Monitors` privileges for the entity. This is granted to Entity owners by default.
|
||||
`Edit Assertions`, `Edit Monitors`, **and the additional `Edit SQL Assertion Monitors`** privileges for the entity. This is granted to Entity owners by default.
|
||||
|
||||
2. **Data Platform Connection**: In order to create a Custom SQL Assertion, you'll need to have an **Ingestion Source** configured to your
|
||||
Data Platform: Snowflake, BigQuery, or Redshift under the **Integrations** tab.
|
||||
|
||||
@ -107,12 +107,14 @@ Change Source types vary by the platform, but generally fall into these categori
|
||||
|
||||
- **Audit Log** (Default): A metadata API or Table that is exposed by the Data Warehouse which contains captures information about the
|
||||
operations that have been performed to each Table. It is usually efficient to check, but some useful operations are not
|
||||
fully supported across all major Warehouse platforms.
|
||||
fully supported across all major Warehouse platforms. Note that for Databricks, [this option](https://docs.databricks.com/en/delta/history.html)
|
||||
is only available for tables stored in Delta format.
|
||||
|
||||
- **Information Schema**: A system Table that is exposed by the Data Warehouse which contains live information about the Databases
|
||||
and Tables stored inside the Data Warehouse. It is usually efficient to check, but lacks detailed information about the _type_
|
||||
of change that was last made to a specific table (e.g. the operation itself - INSERT, UPDATE, DELETE, number of impacted rows, etc)
|
||||
|
||||
of change that was last made to a specific table (e.g. the operation itself - INSERT, UPDATE, DELETE, number of impacted rows, etc).
|
||||
Note that for Databricks, [this option](https://docs.databricks.com/en/delta/table-details.html) is only available for tables stored in Delta format.
|
||||
|
||||
- **Last Modified Column**: A Date or Timestamp column that represents the last time that a specific _row_ was touched or updated.
|
||||
Adding a Last Modified Column to each warehouse Table is a pattern is often used for existing use cases around change management.
|
||||
If this change source is used, a query will be issued to the Table to search for rows that have been modified within a specific
|
||||
@ -128,8 +130,11 @@ Change Source types vary by the platform, but generally fall into these categori
|
||||
This relies on Operations being reported to DataHub, either via ingestion or via use of the DataHub APIs (see [Report Operation via API](#reporting-operations-via-api)).
|
||||
Note if you have not configured an ingestion source through DataHub, then this may be the only option available. By default, any operation type found will be considered a valid change. Use the **Operation Types** dropdown when selecting this option to specify which operation types should be considered valid changes. You may choose from one of DataHub's standard Operation Types, or specify a "Custom" Operation Type by typing in the name of the Operation Type.
|
||||
|
||||
Using either of the column value approaches (**Last Modified Column** or **High Watermark Column**) to determine whether a Table has changed can be useful because it can be customized to determine whether specific types of important changes have been made to a given Table.
|
||||
Because it does not involve system warehouse tables, it is also easily portable across Data Warehouse and Data Lake providers.
|
||||
- **File Metadata** (Databricks Only): A column that is exposed by Databricks for both Unity Catalog and Hive Metastore based tables
|
||||
which includes information about the last time that a file for the table was changed. Read more about it [here](https://docs.databricks.com/en/ingestion/file-metadata-column.html).
|
||||
|
||||
Using either of the column value approaches (**Last Modified Column** or **High Watermark Column**) to determine whether a Table has changed can be useful because it can be customized to determine whether specific types of changes have been made to a given Table.
|
||||
And because this type of assertion does not involve system warehouse tables, they are easily portable across Data Warehouse and Data Lake providers.
|
||||
|
||||
Freshness Assertions also have an off switch: they can be started or stopped at any time with the click of button.
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user