Updated documentation for Test Suite (#7167)
* Added part 1 of test suite doc * Added documentation for custom TestCase
@ -441,6 +441,10 @@ site_menu:
|
||||
url: /openmetadata/ingestion/workflows/profiler
|
||||
- category: OpenMetadata / Ingestion / Workflows / Profiler / Metrics
|
||||
url: /openmetadata/ingestion/workflows/profiler/metrics
|
||||
- category: OpenMetadata / Ingestion / Workflows / Data Quality
|
||||
url: /openmetadata/ingestion/workflows/data-quality
|
||||
- category: OpenMetadata / Ingestion / Workflows / Data Quality / Tests
|
||||
url: /openmetadata/ingestion/workflows/data-quality/tests
|
||||
- category: OpenMetadata / Ingestion / Lineage
|
||||
url: /openmetadata/ingestion/lineage
|
||||
- category: OpenMetadata / Ingestion / Lineage / Edit Data Lineage Manually
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Athena connector.
|
||||
Configure and schedule Athena metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -361,7 +361,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Athena connector.
|
||||
Configure and schedule Athena metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -314,7 +314,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Athena connector.
|
||||
Configure and schedule Athena metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -221,7 +221,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the AzureSQL connector.
|
||||
Configure and schedule AzureSQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -357,7 +357,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the AzureSQL connector.
|
||||
Configure and schedule AzureSQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -310,7 +310,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the AzureSQL connector.
|
||||
Configure and schedule AzureSQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -218,7 +218,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule BigQuery metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -518,7 +518,7 @@ pip3 install --upgrade 'openmetadata-ingestion[bigquery-usage]'
|
||||
|
||||
For the usage workflow creation, the Airflow file will look the same as for the metadata ingestion. Updating the YAML configuration will be enough.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule BigQuery metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -475,7 +475,7 @@ After saving the YAML config, we will run the command the same way we did for th
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule BigQuery metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -268,7 +268,7 @@ text="Learn more about how to configure the Usage Workflow to ingest Query and L
|
||||
link="/openmetadata/ingestion/workflows/usage"
|
||||
/>
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Clickhouse metadata and profiler workflows from the OpenM
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -443,7 +443,7 @@ pip3 install --upgrade 'openmetadata-ingestion[clickhouse-usage]'
|
||||
|
||||
For the usage workflow creation, the Airflow file will look the same as for the metadata ingestion. Updating the YAML configuration will be enough.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Clickhouse metadata and profiler workflows from the OpenM
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -400,7 +400,7 @@ After saving the YAML config, we will run the command the same way we did for th
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Clickhouse metadata and profiler workflows from the OpenM
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -226,7 +226,7 @@ text="Learn more about how to configure the Usage Workflow to ingest Query and L
|
||||
link="/openmetadata/ingestion/workflows/usage"
|
||||
/>
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Databricks connecto
|
||||
Configure and schedule Databricks metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Databricks connecto
|
||||
Configure and schedule Databricks metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Databricks connecto
|
||||
Configure and schedule Databricks metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -217,7 +217,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the DB2 connector.
|
||||
Configure and schedule DB2 metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -354,7 +354,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the DB2 connector.
|
||||
Configure and schedule DB2 metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -307,7 +307,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the DB2 connector.
|
||||
Configure and schedule DB2 metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -216,7 +216,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Druid connector.
|
||||
Configure and schedule Druid metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -355,7 +355,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Druid connector.
|
||||
Configure and schedule Druid metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -308,7 +308,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Druid connector.
|
||||
Configure and schedule Druid metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -215,7 +215,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Hive connector.
|
||||
Configure and schedule Hive metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -354,7 +354,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Hive connector.
|
||||
Configure and schedule Hive metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -307,7 +307,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Hive connector.
|
||||
Configure and schedule Hive metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -217,7 +217,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MariaDB connector.
|
||||
Configure and schedule MariaDB metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MariaDB connector.
|
||||
Configure and schedule MariaDB metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MariaDB connector.
|
||||
Configure and schedule MariaDB metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -216,7 +216,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule MSSQL metadata and profiler workflows from the OpenMetada
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -442,7 +442,7 @@ pip3 install --upgrade 'openmetadata-ingestion[mssql-usage]'
|
||||
|
||||
For the usage workflow creation, the Airflow file will look the same as for the metadata ingestion. Updating the YAML configuration will be enough.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule MSSQL metadata and profiler workflows from the OpenMetada
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -399,7 +399,7 @@ After saving the YAML config, we will run the command the same way we did for th
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule MSSQL metadata and profiler workflows from the OpenMetada
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -229,7 +229,7 @@ text="Learn more about how to configure the Usage Workflow to ingest Query and L
|
||||
link="/openmetadata/ingestion/workflows/usage"
|
||||
/>
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MySQL connector.
|
||||
Configure and schedule MySQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MySQL connector.
|
||||
Configure and schedule MySQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the MySQL connector.
|
||||
Configure and schedule MySQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -218,7 +218,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Oracle connector.
|
||||
Configure and schedule Oracle metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -357,7 +357,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Oracle connector.
|
||||
Configure and schedule Oracle metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -310,7 +310,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Oracle connector.
|
||||
Configure and schedule Oracle metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -217,7 +217,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Postgres connector.
|
||||
Configure and schedule Postgres metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Postgres connector.
|
||||
Configure and schedule Postgres metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the PostgreSQL connecto
|
||||
Configure and schedule PostgreSQL metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -216,7 +216,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Presto connector.
|
||||
Configure and schedule Presto metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -358,7 +358,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Presto connector.
|
||||
Configure and schedule Presto metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -311,7 +311,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Presto connector.
|
||||
Configure and schedule Presto metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -217,7 +217,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Redshift metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -451,7 +451,7 @@ pip3 install --upgrade 'openmetadata-ingestion[redshift-usage]'
|
||||
|
||||
For the usage workflow creation, the Airflow file will look the same as for the metadata ingestion. Updating the YAML configuration will be enough.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Redshift metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -408,7 +408,7 @@ After saving the YAML config, we will run the command the same way we did for th
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Redshift metadata and profiler workflows from the OpenMet
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -232,7 +232,7 @@ text="Learn more about how to configure the Usage Workflow to ingest Query and L
|
||||
link="/openmetadata/ingestion/workflows/usage"
|
||||
/>
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Salesforce connecto
|
||||
Configure and schedule Salesforce metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -356,7 +356,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Salesforce connecto
|
||||
Configure and schedule Salesforce metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -309,7 +309,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Salesforce connecto
|
||||
Configure and schedule Salesforce metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -218,7 +218,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Singlestore connect
|
||||
Configure and schedule Singlestore metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Singlestore connect
|
||||
Configure and schedule Singlestore metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Singlestore connect
|
||||
Configure and schedule Singlestore metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -215,7 +215,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Snowflake metadata and profiler workflows from the OpenMe
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -463,7 +463,7 @@ pip3 install --upgrade 'openmetadata-ingestion[snowflake-usage]'
|
||||
For the usage workflow creation, the Airflow file will look the same as for the metadata ingestion. Updating the YAML configuration will be enough.
|
||||
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Snowflake metadata and profiler workflows from the OpenMe
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -419,7 +419,7 @@ After saving the YAML config, we will run the command the same way we did for th
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -11,7 +11,7 @@ Configure and schedule Snowflake metadata and profiler workflows from the OpenMe
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Query Usage and Lineage Ingestion](#query-usage-and-lineage-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -239,7 +239,7 @@ text="Learn more about how to configure the Usage Workflow to ingest Query and L
|
||||
link="/openmetadata/ingestion/workflows/usage"
|
||||
/>
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Trino connector.
|
||||
Configure and schedule Trino metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -361,7 +361,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Trino connector.
|
||||
Configure and schedule Trino metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -314,7 +314,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Trino connector.
|
||||
Configure and schedule Trino metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -217,7 +217,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Vertica connector.
|
||||
Configure and schedule Vertica metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -353,7 +353,7 @@ with DAG(
|
||||
Note that from connector to connector, this recipe will always be the same.
|
||||
By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Vertica connector.
|
||||
Configure and schedule Vertica metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
## Requirements
|
||||
@ -306,7 +306,7 @@ metadata ingest -c <path-to-yaml>
|
||||
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||
you will be able to extract metadata from different sources.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
The Data Profiler workflow will be using the `orm-profiler` processor.
|
||||
While the `serviceConnection` will still be the same to reach the source system, the `sourceConfig` will be
|
||||
|
||||
@ -10,7 +10,7 @@ In this section, we provide guides and references to use the Vertica connector.
|
||||
Configure and schedule Vertica metadata and profiler workflows from the OpenMetadata UI:
|
||||
- [Requirements](#requirements)
|
||||
- [Metadata Ingestion](#metadata-ingestion)
|
||||
- [Data Profiler and Quality Tests](#data-profiler-and-quality-tests)
|
||||
- [Data Profiler](#data-profiler)
|
||||
- [DBT Integration](#dbt-integration)
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||
@ -216,7 +216,7 @@ caption="Edit and Deploy the Ingestion Pipeline"
|
||||
|
||||
From the Connection tab, you can also Edit the Service if needed.
|
||||
|
||||
## Data Profiler and Quality Tests
|
||||
## Data Profiler
|
||||
|
||||
<Tile
|
||||
icon="schema"
|
||||
|
||||
@ -1,237 +0,0 @@
|
||||
---
|
||||
title: Data Quality
|
||||
slug: /openmetadata/data-quality
|
||||
---
|
||||
|
||||
# Data Quality
|
||||
Learn how you can use OpenMetadata to define Data Quality tests and measure your data reliability.
|
||||
## Requirements
|
||||
|
||||
### OpenMetadata (version 0.10 or later)
|
||||
|
||||
You must have a running deployment of OpenMetadata to use this guide. OpenMetadata includes the following services:
|
||||
|
||||
* OpenMetadata server supporting the metadata APIs and user interface
|
||||
* Elasticsearch for metadata search and discovery
|
||||
* MySQL as the backing store for all metadata
|
||||
* Airflow for metadata ingestion workflows
|
||||
|
||||
To deploy OpenMetadata checkout the [deployment guide](/deployment)
|
||||
|
||||
### Python (version 3.8.0 or later)
|
||||
|
||||
Please use the following command to check the version of Python you have.
|
||||
|
||||
```
|
||||
python3 --version
|
||||
```
|
||||
|
||||
## Building Trust
|
||||
|
||||
OpenMetadata aims to be where all users share and collaborate around data. One of the main benefits of ingesting metadata into OpenMetadata is to make assets discoverable.
|
||||
|
||||
However, we need to ask ourselves: What happens after a user stumbles upon our assets? Then, we can help other teams use the data by adding proper descriptions with up-to-date information or even examples on how to extract information properly.
|
||||
|
||||
What is imperative to do, though, is to build **trust**. For example, users might find a Table that looks useful for their use case, but how can they be sure it correctly follows the SLAs? What issues has this source undergone in the past? Data Quality & tests play a significant role in making any asset trustworthy. Being able to show together the Entity information and its reliability will help our users think, "This is safe to use".
|
||||
|
||||
This section will show you how to configure and run Data Profiling and Quality pipelines with the supported tests.
|
||||
|
||||
## Data Profiling
|
||||
|
||||
### Workflows
|
||||
|
||||
The **Ingestion Framework** currently supports two types of workflows:
|
||||
|
||||
* **Ingestion:** Captures metadata from the sources and updates the Entities' instances. This is a lightweight process that can be scheduled to have fast feedback on metadata changes in our sources. This workflow handles both the metadata ingestion as well as the usage and lineage information from the sources, when available.
|
||||
* **Profiling:** Extracts metrics from SQL sources and sets up and runs Data Quality tests. It requires previous executions of the Ingestion Pipeline. This is a more time-consuming workflow that will run metrics and compare their result to the configured tests of both Tables and Columns.
|
||||
|
||||
<Note>
|
||||
|
||||
Note that you can configure the ingestion pipelines with `source.config.data_profiler_enabled` as `"true"` or `"false"` to run the profiler as well during the metadata ingestion. This, however, **does not support** Quality Tests.
|
||||
|
||||
</Note>
|
||||
|
||||
### Profiling Overview
|
||||
#### Requirements
|
||||
|
||||
The source layer of the Profiling workflow is the OpenMetadata API. Based on the source configuration, this process lists the tables to be executed.
|
||||
|
||||
#### Description
|
||||
|
||||
The steps of the **Profiling** pipeline are the following:
|
||||
|
||||
1. First, use the source configuration to create a connection.
|
||||
2. Next, iterate over the selected tables and schemas that the Ingestion has previously recorded in OpenMetadata.
|
||||
3. Run a default set of metrics to all the table's columns. (We will add more customization in the future releases).
|
||||
4. Finally, compare the metrics' results against the configured Data Quality tests.
|
||||
|
||||
<Note>
|
||||
|
||||
Note that all the results are published to the OpenMetadata API, both the Profiling and the tests executions. This will allow users to visit the evolution of the data and its reliability directly in the UI.
|
||||
|
||||
</Note>
|
||||
|
||||
You can take a look at the supported metrics and tests here:
|
||||
|
||||
<TileContainer>
|
||||
<Tile
|
||||
icon="manage_accounts"
|
||||
title="Metrics"
|
||||
text="Supported metrics"
|
||||
link={"/openmetadata/data-quality/metrics"}
|
||||
size="half"
|
||||
/>
|
||||
<Tile
|
||||
icon="manage_accounts"
|
||||
title="Tests"
|
||||
text="Supported tests and how to configure them"
|
||||
link={"/openmetadata/data-quality/tests"}
|
||||
size="half"
|
||||
/>
|
||||
</TileContainer>
|
||||
|
||||
## How to Add Tests
|
||||
|
||||
Tests are part of the Table Entity. We can add new tests to a Table from the UI or directly use the JSON configuration of the workflows.
|
||||
|
||||
<Note>
|
||||
|
||||
Note that in order to add tests and run the Profiler workflow, the metadata should have already been ingested.
|
||||
|
||||
</Note>
|
||||
|
||||
### Add Tests in the UI
|
||||
|
||||
To create a new test, we can go to the _Table_ page under the _Data Quality_ tab:
|
||||
<Image
|
||||
src={"/images/openmetadata/data-quality/data-quality-tab.png"}
|
||||
alt="Data Quality Tab in the Table Page"
|
||||
caption="Data Quality Tab in the Table Page"
|
||||
/>
|
||||
|
||||
Clicking on _Add Test_ will allow us two options: **Table Test** or **Column Test**. A Table Test will be run on metrics from the whole table, such as the number of rows or columns, while Column Tests are specific to each column's values.
|
||||
|
||||
#### Add Table Tests
|
||||
|
||||
Adding a Table Test will show us the following view:
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/data-quality/table-test.png"}
|
||||
alt="Add a Table Test"
|
||||
caption="Add a Table Test"
|
||||
/>
|
||||
|
||||
* **Test Type**: It allows us to specify the test we want to configure.
|
||||
* **Description**: To explain why the test is necessary and what scenarios we want to validate.
|
||||
* **Value**: Different tests will show different values here. For example, the `tableColumnCountToEqual` requires us to specify the number of columns we expect. Other tests will have other forms when we need to add values such as `min` and `max`, while other tests require no value at all, such as tests validating that there are no nulls in a column.
|
||||
|
||||
#### Add Column Tests
|
||||
|
||||
Adding a Column Test will have a similar view:
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/data-quality/column-test.png"}
|
||||
alt="Add Column Test"
|
||||
caption="Add Column Test"
|
||||
/>
|
||||
|
||||
The Column Test form will be similar to the Table Test one. The only difference is the **Column Name** field, where we need to select the column we will be targeting for the test.
|
||||
|
||||
<Note>
|
||||
|
||||
You can review the supported tests [here](/openmetadata/data-quality/tests). We will keep expanding the support for new tests in the upcoming releases.
|
||||
|
||||
</Note>
|
||||
|
||||
Once tests are added, we will be able to see them in the _Data Quality_ tab:
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/data-quality/created-tests.png"}
|
||||
alt="Freshly created tests"
|
||||
caption="Freshly created tests"
|
||||
/>
|
||||
|
||||
Note how the tests are grouped in Table and Column tests. All tests from the same column will also be grouped together. From this view, we can both edit and delete the tests if needed.
|
||||
|
||||
In the global Table information at the top, we will also be able to see how many Table Tests have been configured.
|
||||
|
||||
### Add Tests with the JSON Config
|
||||
|
||||
In the [connectors](/openmetadata/connectors) documentation for each source, we showcase how to run the Profiler Workflow using the Airflow SDK or the `metadata` CLI. When configuring the JSON configuration for the workflow, we can add tests as well.
|
||||
|
||||
Any tests added to the JSON configuration will also be reflected in the Data Quality tab. This JSON configuration can be used for both the Airflow SDK and to run the workflow with the CLI.
|
||||
|
||||
You can find further information on how to prepare the JSON configuration for each of the sources. However, adding any number of tests is a matter of updating the `processor` configuration as follows:
|
||||
|
||||
```json
|
||||
"processor": {
|
||||
"type": "orm-profiler",
|
||||
"config": {
|
||||
"test_suite": {
|
||||
"name": "<Test Suite name>",
|
||||
"tests": [
|
||||
{
|
||||
"table": "<Table FQN>",
|
||||
"table_tests": [
|
||||
{
|
||||
"testCase": {
|
||||
"config": {
|
||||
"value": 100
|
||||
},
|
||||
"tableTestType": "tableRowCountToEqual"
|
||||
}
|
||||
}
|
||||
],
|
||||
"column_tests": [
|
||||
{
|
||||
"columnName": "<Column Name>",
|
||||
"testCase": {
|
||||
"config": {
|
||||
"minValue": 0,
|
||||
"maxValue": 99
|
||||
},
|
||||
"columnTestType": "columnValuesToBeBetween"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},son
|
||||
```
|
||||
|
||||
`tests` is a list of test definitions that will be applied to the `table`, informed by its FQN. For each table, one can then define a list of `table_tests` and `column_tests`. Review the supported tests and their definitions to learn how to configure the different cases [here](/openmetadata/data-quality/tests).
|
||||
|
||||
## How to Run Tests
|
||||
|
||||
Both the Profiler and Tests are executed in the Profiler Workflow. All the results will be available through the UI in the _Profiler_ and _Data Quality_ tabs.
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/data-quality/test-results.png"}
|
||||
alt="Tests results in the Data Quality tab"
|
||||
caption="Tests results in the Data Quality tab"
|
||||
/>
|
||||
|
||||
To learn how to prepare and run the Profiler Workflow for a given source, you can take a look at the documentation for that specific [connector](/openmetadata/connectors).
|
||||
|
||||
## Where are the Tests stored?
|
||||
|
||||
Once you create a Test definition for a Table or any of its Columns, that Test becomes a part of the Table Entity. This means that it does not matter from where you create tests (JSON Configuration vs. UI). As once the test gets registered to OpenMetadata, it will always be executed as part of the Profiler Workflow.
|
||||
|
||||
You can check what tests an Entity has configured in the **Data Quality** tab of the UI, or by using the API:
|
||||
|
||||
```python
|
||||
from metadata.ingestion.ometa.ometa_api import OpenMetadata
|
||||
from metadata.ingestion.ometa.openmetadata_rest import MetadataServerConfig
|
||||
|
||||
from metadata.generated.schema.entity.data.table import Table
|
||||
|
||||
|
||||
server_config = MetadataServerConfig(api_endpoint="http://localhost:8585/api")
|
||||
metadata = OpenMetadata(server_config)
|
||||
|
||||
table = metadata.get_by_name(entity=Table, fqdn="FQDN", fields=["tests"])
|
||||
```
|
||||
|
||||
You can then check `table.tableTests`, or for each Column `column.columnTests` to get the test information.
|
||||
@ -0,0 +1,352 @@
|
||||
---
|
||||
title: Data Quality
|
||||
slug: /openmetadata/ingestion/workflows/data-quality
|
||||
---
|
||||
|
||||
# Data Quality
|
||||
Learn how you can use OpenMetadata to define Data Quality tests and measure your data reliability.
|
||||
## Requirements
|
||||
|
||||
### OpenMetadata (version 0.12 or later)
|
||||
|
||||
You must have a running deployment of OpenMetadata to use this guide. OpenMetadata includes the following services:
|
||||
|
||||
* OpenMetadata server supporting the metadata APIs and user interface
|
||||
* Elasticsearch for metadata search and discovery
|
||||
* MySQL as the backing store for all metadata
|
||||
* Airflow for metadata ingestion workflows
|
||||
|
||||
To deploy OpenMetadata checkout the [deployment guide](/deployment)
|
||||
|
||||
### Python (version 3.8.0 or later)
|
||||
|
||||
Please use the following command to check the version of Python you have.
|
||||
|
||||
```
|
||||
python3 --version
|
||||
```
|
||||
|
||||
## Building Trust with Data Quality
|
||||
|
||||
OpenMetadata is where all users share and collaborate around data. It is where you make your assets discoverable; with data quality you make these assets **trustable**.
|
||||
|
||||
This section will show you how to configure and run Data Quality pipelines with the OpenMetadata built-in tests.
|
||||
|
||||
## Main Concepts
|
||||
### Test Suite
|
||||
Test Suites are containers allowing you to group related Test Cases together. Once configured, a Test Suite can easily be deployed to execute all the Test Cases it contains.
|
||||
|
||||
### Test Definition
|
||||
Test Definitions are generic tests definition elements specific to a test such as:
|
||||
- test name
|
||||
- column name
|
||||
- data type
|
||||
|
||||
### Test Cases
|
||||
Test Cases specify a Test Definition. It will define what condition a test must meet to be successful (e.g. `max=n`, etc.). One Test Definition can be linked to multiple Test Cases.
|
||||
|
||||
## Adding Tests Through the UI
|
||||
|
||||
**Note:** you will need to make sure you have the right permission in OpenMetadata to create a test.
|
||||
|
||||
### Step 1: Creating a Test Suite
|
||||
From your table service click on the `profiler` tab. From there you will be able to create table tests by clicking on the purple background `Add Test` top button or column tests by clicking on the white background `Add Test` button.
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"}
|
||||
alt="Write your first test"
|
||||
caption="Write your first test"
|
||||
/>
|
||||
|
||||
On the next page you will be able to either select an existing Test Suite or Create a new one. If you select an existing one your Test Case will automatically be added to the Test Suite
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"}
|
||||
alt="Create test suite"
|
||||
caption="Create test suite"
|
||||
/>
|
||||
|
||||
### Step 2: Create a Test Case
|
||||
On the next page, you will create a Test Case. You will need to select a Test Definition from the drop down menu and specify the parameters of your Test Case.
|
||||
|
||||
**Note:** Test Case name needs to be unique across the whole platform. A warning message will show if your Test Case name is not unique.
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/test-case-page.png"}
|
||||
alt="Create test case"
|
||||
caption="Create test case"
|
||||
/>
|
||||
|
||||
### Step 3: Add Ingestion Workflow
|
||||
If you have created a new test suite you will see a purple background `Add Ingestion` button after clicking `submit`. This will allow you to schedule the execution of your Test Suite. If you have selected an existing Test Suite you are all set.
|
||||
|
||||
After clicking `Add Ingestion` you will be able to select an execution schedule for your Test Suite (note that you can edit this later). Once you have selected the desired scheduling time, click submit and you are all set.
|
||||
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"}
|
||||
alt="Create ingestion workflow"
|
||||
caption="Create ingestion workflow"
|
||||
/>
|
||||
|
||||
|
||||
## Adding Tests with the YAML Config
|
||||
When creating a JSON config for a test workflow the source configuration is very simple.
|
||||
```
|
||||
source:
|
||||
type: TestSuite
|
||||
serviceName: <your_service_name>
|
||||
sourceConfig:
|
||||
config:
|
||||
type: TestSuite
|
||||
```
|
||||
The only section you need to modify here is the `serviceName` key. Note that this name needs to be unique across OM platform Test Suite name.
|
||||
|
||||
Once you have defined your source configuration you'll need to define te processor configuration.
|
||||
```
|
||||
processor:
|
||||
type: "orm-test-runner"
|
||||
config:
|
||||
testSuites:
|
||||
- name: [test_suite_name]
|
||||
description: [test suite description]
|
||||
testCases:
|
||||
- name: [test_case_name]
|
||||
description: [test case description]
|
||||
testDefinitionName: [test definition name*]
|
||||
entityLink: ["<#E::table::fqn> or <#E::table::fqn::columns::column_name>"]
|
||||
parameterValues:
|
||||
- name: [column parameter name]
|
||||
value: [value]
|
||||
- ...
|
||||
```
|
||||
The processor type should be set to ` "orm-test-runner"`. For accepted test definition names and parameter value names refer to the [tests page](/content/openmetadata/ingestion/workflows/data-quality/tests.md).
|
||||
|
||||
|
||||
`sink` and `workflowConfig` will have the same settings than the ingestion and profiler workflow.
|
||||
|
||||
### Full `yaml` config example
|
||||
|
||||
```
|
||||
source:
|
||||
type: TestSuite
|
||||
serviceName: MyAwesomeTestSuite
|
||||
sourceConfig:
|
||||
config:
|
||||
type: TestSuite
|
||||
|
||||
processor:
|
||||
type: "orm-test-runner"
|
||||
config:
|
||||
testSuites:
|
||||
- name: test_suite_one
|
||||
description: this is a test testSuite to confirm test suite workflow works as expected
|
||||
testCases:
|
||||
- name: a_column_test
|
||||
description: A test case
|
||||
testDefinitionName: columnValuesToBeBetween
|
||||
entityLink: "<#E::table::local_redshift.dev.dbt_jaffle.customers::columns::number_of_orders>"
|
||||
parameterValues:
|
||||
- name: minValue
|
||||
value: 2
|
||||
- name: maxValue
|
||||
value: 20
|
||||
|
||||
sink:
|
||||
type: metadata-rest
|
||||
config: {}
|
||||
workflowConfig:
|
||||
openMetadataServerConfig:
|
||||
hostPort: http://localhost:8585/api
|
||||
authProvider: no-auth
|
||||
```
|
||||
|
||||
### How to Run Tests
|
||||
To run the tests from the CLI execute the following command
|
||||
```
|
||||
metadata test -c /path/to/my/config.yaml
|
||||
```
|
||||
|
||||
## How to Visualize Test Results
|
||||
### From the Test Suite View
|
||||
From the home page click on the Test Suite menu in the left pannel.
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"}
|
||||
alt="Test suite home page"
|
||||
caption="Test suite home page"
|
||||
/>
|
||||
|
||||
This will bring you to the Test Suite page where you can select a specific Test Suite.
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"}
|
||||
alt="Test suite landing page"
|
||||
caption="Test suite landing page"
|
||||
/>
|
||||
|
||||
From there you can select a Test Suite and visualize the results associated with this specific Test Suite.
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"}
|
||||
alt="Test suite results page"
|
||||
caption="Test suite results page"
|
||||
/>
|
||||
|
||||
### From a Table Entity
|
||||
Navigate to your table and click on the `profiler` tab. From there you'll be able to see test results at the table or column level.
|
||||
#### Table Level Test Results
|
||||
In the top pannel, click on the white background `Data Quality` button. This will bring you to a summary of all your quality tests at the table level
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
#### Column Level Test Results
|
||||
On the profiler page, click on a specific column name. This will bring you to a new page where you can click the white background `Quality Test` button to see all the tests results related to your column.
|
||||
<Image
|
||||
src={"/images/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
## Adding Custom Tests
|
||||
While OpenMetadata provides out of the box tests, you may want to write your test results from your own custom quality test suite. This is very easy to do using the API.
|
||||
### Creating a `TestDefinition`
|
||||
First, you'll need to create a Test Definition for your test. You can use the following endpoint `/api/v1/testDefinition` using a POST protocol to create your Test Definition. You will need to pass the following data in the body your request at minimum.
|
||||
|
||||
```
|
||||
{
|
||||
"description": "<you test definition description>",
|
||||
"entityType": "<TABLE or COLUMN>",
|
||||
"name": "<your_test_name>",
|
||||
"testPlatforms": ["<any of OpenMetadata,GreatExpectations, DBT, Deequ, Soda, Other>"],
|
||||
"parameterDefinition": [
|
||||
{
|
||||
"name": "<name>"
|
||||
},
|
||||
{
|
||||
"name": "<name>"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Here is a complete CURL request
|
||||
|
||||
```
|
||||
curl --request POST 'http://localhost:8585/api/v1/testDefinition' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data-raw '{
|
||||
"description": "A demo custom test",
|
||||
"entityType": "TABLE",
|
||||
"name": "demo_test_definition",
|
||||
"testPlatforms": ["Soda", "DBT"],
|
||||
"parameterDefinition": [{
|
||||
"name": "ColumnOne"
|
||||
}]
|
||||
}'
|
||||
```
|
||||
|
||||
Make sure to keep the `UUID` from the response as you will need it to create the Test Case.
|
||||
|
||||
### Creating a `TestSuite`
|
||||
You'll also need to create a Test Suite for your Test Case -- note that you can also use an existing one if you want to. You can use the following endpoint `/api/v1/testSuite` using a POST protocol to create your Test Definition. You will need to pass the following data in the body your request at minimum.
|
||||
|
||||
```
|
||||
{
|
||||
"name": "<test_suite_name>",
|
||||
"description": "<test suite description>"
|
||||
}
|
||||
```
|
||||
|
||||
Here is a complete CURL request
|
||||
|
||||
```
|
||||
curl --request POST 'http://localhost:8585/api/v1/testSuite' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data-raw '{
|
||||
"name": "<test_suite_name>",
|
||||
"description": "<test suite description>"
|
||||
}'
|
||||
```
|
||||
|
||||
Make sure to keep the `UUID` from the response as you will need it to create the Test Case.
|
||||
|
||||
|
||||
### Creating a `TestCase`
|
||||
Once you have your Test Definition created you can create a Test Case -- which is a specification of your Test Definition. You can use the following endpoint `/api/v1/testCase` using a POST protocol to create your Test Case. You will need to pass the following data in the body your request at minimum.
|
||||
|
||||
```
|
||||
{
|
||||
"entityLink": "<#E::table::fqn> or <#E::table::fqn::columns::column name>",
|
||||
"name": "<test_case_name>",
|
||||
"testDefinition": {
|
||||
"id": "<test definition UUID>",
|
||||
"type": "testDefinition"
|
||||
},
|
||||
"testSuite": {
|
||||
"id": "<test suite UUID>",
|
||||
"type": "testSuite"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Important:** for `entityLink` make sure to include the starting and ending `<>`
|
||||
|
||||
Here is a complete CURL request
|
||||
|
||||
```
|
||||
curl --request POST 'http://localhost:8585/api/v1/testCase' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data-raw '{
|
||||
"entityLink": "<#E::table::local_redshift.dev.dbt_jaffle.customers>",
|
||||
"name": "custom_test_Case",
|
||||
"testDefinition": {
|
||||
"id": "1f3ce6f5-67be-45db-8314-2ee42d73239f",
|
||||
"type": "testDefinition"
|
||||
},
|
||||
"testSuite": {
|
||||
"id": "3192ed9b-5907-475d-a623-1b3a1ef4a2f6",
|
||||
"type": "testSuite"
|
||||
},
|
||||
"parameterValues": [
|
||||
{
|
||||
"name": "colName",
|
||||
"value": 10
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Make sure to keep the `UUID` from the response as you will need it to create the Test Case.
|
||||
|
||||
|
||||
### Writing `TestCaseResults`
|
||||
Once you have your Test Case created you can write your results to it. You can use the following endpoint `/api/v1/testCase/{test FQN}/testCaseResult` using a PUT protocol to add Test Case Results. You will need to pass the following data in the body your request at minimum.
|
||||
|
||||
```
|
||||
{
|
||||
"result": "<result message>",
|
||||
"testCaseStatus": "<Success or Failed or Aborted>",
|
||||
"timestamp": <Unix timestamp>,
|
||||
"testResultValue": [
|
||||
{
|
||||
"value": "<value>"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Here is a complete CURL request
|
||||
|
||||
```
|
||||
curl --location --request PUT 'http://localhost:8585/api/v1/testCase/local_redshift.dev.dbt_jaffle.customers.custom_test_Case/testCaseResult' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data-raw '{
|
||||
"result": "found 1 values expected n",
|
||||
"testCaseStatus": "Success",
|
||||
"timestamp": 1662129151,
|
||||
"testResultValue": [{
|
||||
"value": "10"
|
||||
}]
|
||||
}'
|
||||
```
|
||||
|
||||
You will now be able to see your test in the Test Suite or the table entity.
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
---
|
||||
title: Tests
|
||||
slug: /openmetadata/data-quality/tests
|
||||
slug: /openmetadata/ingestion/workflows/data-quality/tests
|
||||
---
|
||||
|
||||
# Tests
|
||||
|
Before Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 196 KiB |
|
Before Width: | Height: | Size: 178 KiB |
|
Before Width: | Height: | Size: 94 KiB |
|
Before Width: | Height: | Size: 331 KiB |
|
After Width: | Height: | Size: 98 KiB |
|
After Width: | Height: | Size: 30 KiB |
|
After Width: | Height: | Size: 102 KiB |
|
After Width: | Height: | Size: 24 KiB |
|
After Width: | Height: | Size: 63 KiB |
|
After Width: | Height: | Size: 70 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 65 KiB |
|
After Width: | Height: | Size: 105 KiB |
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 54 KiB |
|
Before Width: | Height: | Size: 128 KiB After Width: | Height: | Size: 128 KiB |
|
Before Width: | Height: | Size: 170 KiB After Width: | Height: | Size: 170 KiB |