mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-12-28 16:08:23 +00:00
Update ingestion section (#11265)
This commit is contained in:
parent
86b34ed151
commit
f1d57caf97
@ -32,7 +32,10 @@ Everything in OpenMetadata is centralized and managed via the API. Then, the Wor
|
||||
via the OpenMetadata server APIs. Morover, the `IngestionPipeline` Entity is also defined in a JSON Schema that you
|
||||
can find [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json).
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png" alt="system context"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png"
|
||||
alt="system context"
|
||||
/%}
|
||||
|
||||
Note how OpenMetadata here acts as a middleware, connecting the actions being triggered in the UI to external orchestration
|
||||
systems, which will be the ones managing the heavy lifting of getting a workflow created, scheduled and run. Out of the box,
|
||||
@ -68,19 +71,27 @@ After creating a new workflow from the UI or when editing it, there are two call
|
||||
- `POST` or `PUT` call to update the `Ingestion Pipeline Entity`,
|
||||
- `/deploy` HTTP call to the `IngestionPipelienResource` to trigger the deployment of the new or updated DAG in the Orchestrator.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png" alt="software system"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png"
|
||||
alt="software system"
|
||||
/%}
|
||||
|
||||
|
||||
### Creating the Ingestion Pipeline
|
||||
|
||||
Based on its [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json),
|
||||
there are a few properties about the Ingestion Pipeline we can highlight:
|
||||
|
||||
1. `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
|
||||
2. `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
|
||||
**1.** `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
|
||||
|
||||
**2.** `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
|
||||
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
|
||||
3. `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
|
||||
4. `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
|
||||
5. `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
|
||||
|
||||
**3.** `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
|
||||
|
||||
**4.** `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
|
||||
|
||||
**5.** `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
|
||||
|
||||
{% note %}
|
||||
|
||||
@ -89,8 +100,10 @@ schedule. You might see this property here, but the whole process can still supp
|
||||
this up in future releases.
|
||||
|
||||
{% /note %}
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png" alt="container create"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png"
|
||||
alt="container create"
|
||||
/%}
|
||||
|
||||
Here, the process of creating an Ingestion Pipeline is then the same as with any other Entity.
|
||||
|
||||
@ -104,7 +117,10 @@ The role of OpenMetadata here is just to pass the required communication to the
|
||||
DAG. Basically we need a way to send a call to the Orchestrator that generated a DAG / Workflow object that will be run
|
||||
using the proper functions and classes from the Ingestion Framework.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png" alt="deploy"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png"
|
||||
alt="deploy"
|
||||
/%}
|
||||
|
||||
Any Orchestration system that is capable to **DYNAMICALLY** create a workflow based on a given input (that can be obtained
|
||||
from the `IngestionPipeline` Entity information) is a potentially valid candidate to be used as a Pipeline Service.
|
||||
@ -118,8 +134,9 @@ and prepared to contribute a new Pipeline Service Client implementation.
|
||||
In this example I will be deploying an ingestion workflow to get the metadata from a MySQL database. After clicking on the UI
|
||||
to deploy such pipeline, these are the calls that get triggered:
|
||||
|
||||
1. `POST` call to create the `IngestionPipeline` Entity
|
||||
2. `POST` call to deploy the newly created pipeline.
|
||||
**1.** `POST` call to create the `IngestionPipeline` Entity
|
||||
|
||||
**2.** `POST` call to deploy the newly created pipeline.
|
||||
|
||||
## Create the Ingestion Pipeline
|
||||
|
||||
@ -324,10 +341,12 @@ the workflow class depends on our goal: Ingestion, Profiling, Testing...
|
||||
You can follow this logic deeper in the source code of the managed APIs package, but the important thought here is that we
|
||||
need the following logic flow:
|
||||
|
||||
1. An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
|
||||
2. We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
|
||||
**1.** An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
|
||||
|
||||
**2.** We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
|
||||
this something is a `.py` file.
|
||||
3. Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
|
||||
|
||||
**3.** Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
|
||||
APIs package, but if any orchestrator already has an API capable of creating DAGs dynamically, this process can be directly
|
||||
handled in the Pipeline Service Client implementation as all the necessary data is present in the Ingestion Pipeline Entity.
|
||||
|
||||
|
||||
@ -40,8 +40,8 @@ AS SELECT ... FROM schema.table_a JOIN another_schema.table_b;
|
||||
|
||||
From this query we will extract the following information:
|
||||
|
||||
1. There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
|
||||
2. There is a `target` table `schema.my_view`.
|
||||
**1.** There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
|
||||
**2.** There is a `target` table `schema.my_view`.
|
||||
|
||||
In this case we suppose that the database connection requires us to write the table names as `<schema>.<table>`. However,
|
||||
there are other possible options. Sometimes we can find just `<table>` in a query, or even `<database>.<schema>.<table>`.
|
||||
@ -135,8 +135,11 @@ the data feeding the Dashboards and Charts.
|
||||
|
||||
When ingesting the Dashboards metadata, the workflow will pick up the origin tables (or database, in the case of
|
||||
PowerBI), and prepare the lineage information.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png" alt="Dashboard Lineage"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png"
|
||||
alt="Dashboard Lineage"
|
||||
caption="Dashboard Lineage"
|
||||
/%}
|
||||
|
||||
## Pipeline Services
|
||||
|
||||
|
||||
@ -9,7 +9,8 @@ The OpenMetadata home screen features a change activity feed that enables you vi
|
||||
- Data for which you are an owner
|
||||
- Data you are following
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/versioning/change-feeds.gif"}
|
||||
alt="Change feeds"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/versioning/change-feeds.gif"
|
||||
alt="Change feeds"
|
||||
/%}
|
||||
|
||||
|
||||
@ -6,7 +6,8 @@ slug: /connectors/ingestion/versioning/event-notification-via-webhooks
|
||||
# Event Notification via Webhooks
|
||||
The webhook interface allows you to build applications that receive all the data changes happening in your organization through APIs. Register URLs to receive metadata event notifications. Slack integration through incoming webhooks is one of many applications of this feature.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"}
|
||||
alt="Event Notification via Webhooks"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"
|
||||
alt="Event Notification via Webhooks"
|
||||
/%}
|
||||
|
||||
|
||||
@ -15,7 +15,9 @@ Metadata versioning helps **simplify debugging processes**. View the version his
|
||||
|
||||
Versioning also helps in **broader collaboration** among consumers and producers of data. Admins can provide access to more users in the organization to change certain fields. Crowdsourcing makes metadata the collective responsibility of the entire organization.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/versioning/metadata-versioning.gif"}
|
||||
alt="Metadata versioning"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/versioning/metadata-versioning.gif"
|
||||
alt="Metadata versioning"
|
||||
caption="Dashboard Lineage"
|
||||
/%}
|
||||
|
||||
|
||||
@ -53,41 +53,44 @@ Test Cases specify a Test Definition. It will define what condition a test must
|
||||
|
||||
### Step 1: Creating a Test Suite
|
||||
From your table service click on the `profiler` tab. From there you will be able to create table tests by clicking on the purple background `Add Test` top button or column tests by clicking on the white background `Add Test` button.
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"}
|
||||
alt="Write your first test"
|
||||
caption="Write your first test"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"
|
||||
alt="Write your first test"
|
||||
caption="Write your first test"
|
||||
/%}
|
||||
|
||||
|
||||
On the next page you will be able to either select an existing Test Suite or Create a new one. If you select an existing one your Test Case will automatically be added to the Test Suite
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"}
|
||||
alt="Create test suite"
|
||||
caption="Create test suite"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"
|
||||
alt="Create test suite"
|
||||
caption="Create test suite"
|
||||
/%}
|
||||
|
||||
|
||||
### Step 2: Create a Test Case
|
||||
On the next page, you will create a Test Case. You will need to select a Test Definition from the drop down menu and specify the parameters of your Test Case.
|
||||
|
||||
**Note:** Test Case name needs to be unique across the whole platform. A warning message will show if your Test Case name is not unique.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-case-page.png"}
|
||||
alt="Create test case"
|
||||
caption="Create test case"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-case-page.png"
|
||||
alt="Create test case"
|
||||
caption="Create test case"
|
||||
/%}
|
||||
|
||||
|
||||
### Step 3: Add Ingestion Workflow
|
||||
If you have created a new test suite you will see a purple background `Add Ingestion` button after clicking `submit`. This will allow you to schedule the execution of your Test Suite. If you have selected an existing Test Suite you are all set.
|
||||
|
||||
After clicking `Add Ingestion` you will be able to select an execution schedule for your Test Suite (note that you can edit this later). Once you have selected the desired scheduling time, click submit and you are all set.
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"
|
||||
alt="Create ingestion workflow"
|
||||
caption="Create ingestion workflow"
|
||||
/%}
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"}
|
||||
alt="Create ingestion workflow"
|
||||
caption="Create ingestion workflow"
|
||||
/>
|
||||
|
||||
|
||||
## Adding Tests with the YAML Config
|
||||
@ -187,7 +190,7 @@ except ModuleNotFoundError:
|
||||
from airflow.operators.python_operator import PythonOperator
|
||||
|
||||
from metadata.config.common import load_config_file
|
||||
from metadata.test_suite.api.workflow import TestSuiteWorkflow
|
||||
from metadata.data_quality.api.workflow import TestSuiteWorkflow
|
||||
from airflow.utils.dates import days_ago
|
||||
|
||||
default_args = {
|
||||
@ -232,43 +235,54 @@ configurations specified above.
|
||||
## How to Visualize Test Results
|
||||
### From the Test Suite View
|
||||
From the home page click on the Test Suite menu in the left pannel.
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"}
|
||||
alt="Test suite home page"
|
||||
caption="Test suite home page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"
|
||||
alt="Test suite home page"
|
||||
caption="Test suite home page"
|
||||
/%}
|
||||
|
||||
|
||||
This will bring you to the Test Suite page where you can select a specific Test Suite.
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"}
|
||||
alt="Test suite landing page"
|
||||
caption="Test suite landing page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"
|
||||
alt="Test suite landing page"
|
||||
caption="Test suite landing page"
|
||||
/%}
|
||||
|
||||
|
||||
From there you can select a Test Suite and visualize the results associated with this specific Test Suite.
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"}
|
||||
alt="Test suite results page"
|
||||
caption="Test suite results page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"
|
||||
alt="Test suite results page"
|
||||
caption="Test suite results page"
|
||||
/%}
|
||||
|
||||
|
||||
### From a Table Entity
|
||||
Navigate to your table and click on the `profiler` tab. From there you'll be able to see test results at the table or column level.
|
||||
#### Table Level Test Results
|
||||
In the top pannel, click on the white background `Data Quality` button. This will bring you to a summary of all your quality tests at the table level
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/%}
|
||||
|
||||
|
||||
#### Column Level Test Results
|
||||
On the profiler page, click on a specific column name. This will bring you to a new page where you can click the white background `Quality Test` button to see all the tests results related to your column.
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
## Adding Custom Tests
|
||||
While OpenMetadata provides out of the box tests, you may want to write your test results from your own custom quality test suite. This is very easy to do using the API.
|
||||
|
||||
@ -32,7 +32,12 @@ Configure the dbt Workflow from the CLI.
|
||||
|
||||
Queries used to create the dbt models can be viewed in the dbt tab
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt-query" caption="dbt Query"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt-query"
|
||||
caption="dbt Query"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. dbt Lineage
|
||||
|
||||
@ -40,7 +45,12 @@ Lineage from dbt models can be viewed in the Lineage tab.
|
||||
|
||||
For more information on how lineage is extracted from dbt take a look [here](/connectors/ingestion/workflows/dbt/ingest-dbt-lineage)
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png" alt="dbt-lineage" caption="dbt Lineage"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png"
|
||||
alt="dbt-lineage"
|
||||
caption="dbt Lineage"
|
||||
/%}
|
||||
|
||||
|
||||
### 3. dbt Tags
|
||||
|
||||
@ -48,7 +58,12 @@ Table and column level tags can be imported from dbt
|
||||
|
||||
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-tags) for adding dbt tags
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt-tags" caption="dbt Tags"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
|
||||
alt="dbt-tags"
|
||||
caption="dbt Tags"
|
||||
/%}
|
||||
|
||||
|
||||
### 4. dbt Owner
|
||||
|
||||
@ -56,7 +71,11 @@ Owner from dbt models can be imported and assigned to respective tables
|
||||
|
||||
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-owner) for adding dbt owner
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png" alt="dbt-owner" caption="dbt Owner"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png"
|
||||
alt="dbt-owner"
|
||||
caption="dbt Owner"
|
||||
/%}
|
||||
|
||||
### 5. dbt Descriptions
|
||||
|
||||
@ -64,13 +83,22 @@ Descriptions from dbt models can be imported and assigned to respective tables a
|
||||
|
||||
By default descriptions from `manifest.json` will be imported. Descriptions from `catalog.json` will only be updated if catalog file is passed.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png" alt="dbt-descriptions" caption="dbt Descriptions"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png"
|
||||
alt="dbt-descriptions"
|
||||
caption="dbt Descriptions"
|
||||
/%}
|
||||
|
||||
|
||||
### 6. dbt Tests and Test Results
|
||||
|
||||
Tests from dbt will only be imported if the `run_results.json` file is passed.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png" alt="dbt-tests" caption="dbt Tests"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png"
|
||||
alt="dbt-tests"
|
||||
caption="dbt Tests"
|
||||
/%}
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the dbt tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt"
|
||||
caption="dbt"
|
||||
/%}
|
||||
|
||||
|
||||
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
|
||||
|
||||
|
||||
@ -28,7 +28,11 @@ Openmetadata fetches the lineage information from the `manifest.json` file. Belo
|
||||
```
|
||||
|
||||
For the above case the lineage will be created as shown in below:
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png" alt="dbt-lineage-customers" caption="dbt Lineage"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png"
|
||||
alt="dbt-lineage-customers"
|
||||
caption="dbt Lineage"
|
||||
/%}
|
||||
|
||||
### 2. Lineage information from dbt queries
|
||||
Openmetadata fetches the dbt query information from the `manifest.json` file.
|
||||
|
||||
@ -51,11 +51,23 @@ The user or team which will be set as the entity owner should be first created i
|
||||
While linking the owner from `manifest.json` or `catalog.json` files to the entity, OpenMetadata first searches for the user if it is present. If the user is not present it searches for the team
|
||||
|
||||
#### Following steps shows adding a User to OpenMetadata:
|
||||
1. Click on the `Users` section from homepage
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png" alt="click-users-page" caption="Click Users page"/>
|
||||
**1.** Click on the `Users` section from homepage
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png"
|
||||
alt="click-users-page"
|
||||
caption="Click Users page"
|
||||
/%}
|
||||
|
||||
|
||||
**2.** Click on the `Add User` button
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png"
|
||||
alt="click-add-user"
|
||||
caption="Click Add User"
|
||||
/%}
|
||||
|
||||
2. Click on the `Add User` button
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png" alt="click-add-user" caption="Click Add User"/>
|
||||
|
||||
3. Enter the details as shown for the user
|
||||
|
||||
@ -65,16 +77,34 @@ If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`,
|
||||
|
||||
{% /note %}
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png" alt="add-user-dbt" caption="Add User"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png"
|
||||
alt="add-user-dbt"
|
||||
caption="Add User"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
#### Following steps shows adding a Team to OpenMetadata:
|
||||
1. Click on the `Teams` section from homepage
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png" alt="click-teams-page" caption="Click Teams page"/>
|
||||
**1.** Click on the `Teams` section from homepage
|
||||
|
||||
2. Click on the `Add Team` button
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png" alt="click-add-team" caption="Click Add Team"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png"
|
||||
alt="click-teams-page"
|
||||
caption="Click Teams page"
|
||||
/%}
|
||||
|
||||
3. Enter the details as shown for the team
|
||||
|
||||
**2.** Click on the `Add Team` button
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png"
|
||||
alt="click-add-team"
|
||||
caption="Click Add Team"
|
||||
/%}
|
||||
|
||||
|
||||
**3.** Enter the details as shown for the team
|
||||
|
||||
{% note %}
|
||||
|
||||
@ -82,13 +112,22 @@ If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`,
|
||||
|
||||
{% /note %}
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png" alt="add-team-dbt" caption="Add Team"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png"
|
||||
alt="add-team-dbt"
|
||||
caption="Add Team"
|
||||
/%}
|
||||
|
||||
|
||||
## Linking the Owner to the table
|
||||
|
||||
After runing the ingestion workflow with dbt you can see the created user or team getting linked to the table as it's owner as it was specified in the `manifest.json` or `catalog.json` file.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png" alt="linked-user" caption="Linked User"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png"
|
||||
alt="linked-user"
|
||||
caption="Linked User"
|
||||
/%}
|
||||
|
||||
{% note %}
|
||||
|
||||
|
||||
@ -65,4 +65,8 @@ Openmetadata fetches the column-level tags information from the `manifest.json`
|
||||
### 3. Viewing the tags on tables and columns
|
||||
Table and Column level tags ingested from dbt can be viewed on the node in OpenMetadata
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt_tags" caption="dbt tags"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
|
||||
alt="dbt_tags"
|
||||
caption="dbt tags"
|
||||
/%}
|
||||
|
||||
@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the dbt tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt"
|
||||
caption="dbt"
|
||||
/%}
|
||||
|
||||
|
||||
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -20,7 +25,12 @@ We can create a workflow that will obtain the dbt information from the dbt files
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/add-ingestion.png" alt="add-ingestion" caption="Add dbt Ingestion"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add dbt Ingestion"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the dbt Ingestion
|
||||
|
||||
@ -42,7 +52,12 @@ OpenMetadata connects to the AWS s3 bucket via the credentials provided and scan
|
||||
|
||||
The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/aws-s3.png" alt="aws-s3-bucket" caption="AWS S3 Bucket Config"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/aws-s3.png"
|
||||
alt="aws-s3-bucket"
|
||||
caption="AWS S3 Bucket Config"
|
||||
/%}
|
||||
|
||||
|
||||
#### Google Cloud Storage Buckets
|
||||
|
||||
@ -51,13 +66,22 @@ OpenMetadata connects to the GCS bucket via the credentials provided and scans t
|
||||
The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
|
||||
|
||||
GCS credentials can be stored in two ways:
|
||||
1. Entering the credentials directly into the form
|
||||
**1.** Entering the credentials directly into the form
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png" alt="gcs-storage-bucket-form" caption="GCS Bucket config"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png"
|
||||
alt="gcs-storage-bucket-form"
|
||||
caption="GCS Bucket config"
|
||||
/%}
|
||||
|
||||
2. Entering the path of file in which the GCS bucket credentials are stored.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png" alt="gcs-storage-bucket-path" caption="GCS Bucket Path Config"/>
|
||||
**2.** Entering the path of file in which the GCS bucket credentials are stored.
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png"
|
||||
alt="gcs-storage-bucket-path"
|
||||
caption="GCS Bucket Path Config"
|
||||
/%}
|
||||
|
||||
For more information on Google Cloud Storage authentication click [here](https://cloud.google.com/docs/authentication/getting-started#create-service-account-console).
|
||||
|
||||
@ -65,13 +89,23 @@ For more information on Google Cloud Storage authentication click [here](https:/
|
||||
|
||||
Path of the `manifest.json`, `catalog.json` and `run_results.json` files stored in the local system or in the container in which openmetadata server is running can be directly provided.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/local-storage.png" alt="local-storage" caption="Local Storage Config"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/local-storage.png"
|
||||
alt="local-storage"
|
||||
caption="Local Storage Config"
|
||||
/%}
|
||||
|
||||
|
||||
#### File Server
|
||||
|
||||
File server path of the `manifest.json`, `catalog.json` and `run_results.json` files stored on a file server directly provided.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/file_server.png" alt="file-server" caption="File Server Config"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/file_server.png"
|
||||
alt="file-server"
|
||||
caption="File Server Config"
|
||||
/%}
|
||||
|
||||
|
||||
#### dbt Cloud
|
||||
|
||||
@ -79,9 +113,19 @@ Click on the the link [here](https://docs.getdbt.com/guides/getting-started) for
|
||||
OpenMetadata uses dbt cloud APIs to fetch the `run artifacts` (manifest.json, catalog.json and run_results.json) from the most recent dbt run.
|
||||
The APIs need to be authenticated using an Authentication Token. Follow the link [here](https://docs.getdbt.com/dbt-cloud/api-v2#section/Authentication) to generate an authentication token for your dbt cloud account.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-cloud.png" alt="dbt-cloud" caption="dbt Cloud config"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-cloud.png"
|
||||
alt="dbt-cloud"
|
||||
caption="dbt Cloud config"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png" alt="schedule-and-deploy" caption="Schedule dbt ingestion pipeline"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="Schedule dbt ingestion pipeline"
|
||||
/%}
|
||||
|
||||
@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the Lineage tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/table-entity-page.png"
|
||||
alt="table-entity-page"
|
||||
caption="Table Entity Page"
|
||||
/%}
|
||||
|
||||
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add Ingestion"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the Lineage Ingestion
|
||||
|
||||
Here you can enter the Lineage Ingestion details:
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png"
|
||||
alt="configure-lineage-ingestion"
|
||||
caption="Configure the Lineage Ingestion"
|
||||
/%}
|
||||
|
||||
<Collapse title="Lineage Options">
|
||||
|
||||
### Lineage Options
|
||||
|
||||
**Query Log Duration**
|
||||
|
||||
@ -63,10 +77,14 @@ Specify the duration in days for which the profiler should capture lineage data
|
||||
**Result Limit**
|
||||
|
||||
Set the limit for the query log results to be run at a time.
|
||||
</Collapse>
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="View Service Ingestion pipelines"
|
||||
/%}
|
||||
|
||||
@ -12,36 +12,38 @@ After the metadata ingestion has been done correctly, we can configure and deplo
|
||||
|
||||
This Pipeline will be in charge of feeding the Profiler tab of the Table Entity, as well as running any tests configured in the Entity.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"}
|
||||
alt="Table profile summary page"
|
||||
caption="Table profile summary page"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"
|
||||
alt="Table profile summary page"
|
||||
caption="Table profile summary page"
|
||||
/%}
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"}
|
||||
alt="Column profile summary page"
|
||||
caption="Column profile summary page"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"
|
||||
alt="Column profile summary page"
|
||||
caption="Column profile summary page"
|
||||
/%}
|
||||
|
||||
|
||||
### 1. Add a Profiler Ingestion
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Profiler Ingestion.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"}
|
||||
alt="Add a profiler service"
|
||||
caption="Add a profiler service"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"
|
||||
alt="Add a profiler service"
|
||||
caption="Add a profiler service"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the Profiler Ingestion
|
||||
Here you can enter the Profiler Ingestion details.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"}
|
||||
alt="Set profiler configuration"
|
||||
caption="Set profiler configuration"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"
|
||||
alt="Set profiler configuration"
|
||||
caption="Set profiler configuration"
|
||||
/%}
|
||||
|
||||
|
||||
#### Profiler Options
|
||||
**Name**
|
||||
@ -78,17 +80,20 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
|
||||
### 4. Updating Profiler setting at the table level
|
||||
Once you have created your profiler you can adjust some behavior at the table level by going to the table and clicking on the profiler tab
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"}
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/%}
|
||||
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/%}
|
||||
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"}
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/>
|
||||
|
||||
#### Profiler Options
|
||||
**Profile Sample**
|
||||
|
||||
@ -37,11 +37,13 @@ Returns the number of columns in the Table.
|
||||
## System Metrics
|
||||
System metrics provide information related to DML operations performed on the table. These metrics present a concise view of your data freshness. In a typical data processing flow tables are updated at a certain frequency. Table freshness will be monitored by confirming a set of operations has been performed against the table. To increase trust in your data assets, OpenMetadata will monitor the `INSERT`, `UPDATE` and `DELETE` operations performed against your table to showcase 2 metrics related to freshness (see below for more details). With this information, you are able to see when a specific operation was last perform and how many rows it affected.
|
||||
|
||||
<Image
|
||||
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"}
|
||||
alt="table profile freshness metrics"
|
||||
caption="table profile freshness metrics"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"
|
||||
alt="table profile freshness metrics"
|
||||
caption="table profile freshness metrics"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
These metrics are available for **BigQuery**, **Redshift** and **Snowflake**. Other database engines are currently not supported so the computation of the system metrics will be skipped.
|
||||
|
||||
|
||||
@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the Queries tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/table-entity-page.png"
|
||||
alt="table-entity-page"
|
||||
caption="Table Entity Page"
|
||||
/%}
|
||||
|
||||
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Usage Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Usage Ingestion.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add Ingestion"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the Usage Ingestion
|
||||
|
||||
Here you can enter the Usage Ingestion details:
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png" alt="configure-usage-ingestion" caption="Configure the Usage Ingestion"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png"
|
||||
alt="configure-usage-ingestion"
|
||||
caption="Configure the Usage Ingestion"
|
||||
/%}
|
||||
|
||||
<Collapse title="Usage Options">
|
||||
|
||||
### Usage Options
|
||||
|
||||
**Query Log Duration**
|
||||
|
||||
@ -67,10 +81,14 @@ Mention the absolute file path of the temporary file name to store the query log
|
||||
**Result Limit**
|
||||
|
||||
Set the limit for the query log results to be run at a time.
|
||||
</Collapse>
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="View Service Ingestion pipelines"
|
||||
/%}
|
||||
|
||||
@ -12,7 +12,7 @@ One can configure the metadata ingestion filter for database source using four c
|
||||
`Schema Filter Pattern`, `Table Filter Pattern` & `Use FQN For Filtering`. In this documnet we will learn about each field in detail
|
||||
along with many examples.
|
||||
|
||||
<Collapse title="Configuring Filters via UI">
|
||||
### Configuring Filters via UI
|
||||
|
||||
Filters can be configured in UI while adding an ingestion pipeline through `Add Metadata Ingestion` page.
|
||||
|
||||
@ -22,10 +22,9 @@ Filters can be configured in UI while adding an ingestion pipeline through `Add
|
||||
caption="Database Filter Pattern Fields"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via CLI">
|
||||
### Configuring Filters via CLI
|
||||
|
||||
Filters can be configured in CLI in connection configuration within `source.sourceConfig.config` field as described below.
|
||||
|
||||
@ -57,7 +56,6 @@ sourceConfig:
|
||||
- table4
|
||||
```
|
||||
|
||||
</Collapse>
|
||||
|
||||
### Use FQN For Filtering
|
||||
|
||||
@ -93,7 +91,7 @@ In this example we want to ingest all databases which contains `SNOWFLAKE` in na
|
||||
appied would be `.*SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`
|
||||
and `TEST_SNOWFLAKEDB`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-1.png"
|
||||
@ -101,9 +99,8 @@ and `TEST_SNOWFLAKEDB`.
|
||||
caption="Database Filter Pattern Example 1"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -116,14 +113,13 @@ sourceConfig:
|
||||
|
||||
```
|
||||
|
||||
</Collapse>
|
||||
|
||||
#### Example 2
|
||||
|
||||
In this example we want to ingest all databases which starts with `SNOWFLAKE` in name, then the fillter pattern
|
||||
appied would be `^SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE` & `SNOWFLAKE_SAMPLE_DATA`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-2.png"
|
||||
@ -131,9 +127,8 @@ appied would be `^SNOWFLAKE.*` in the include field. This will result in ingesti
|
||||
caption="Database Filter Pattern Example 2"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -145,7 +140,6 @@ sourceConfig:
|
||||
- ^SNOWFLAKE.*
|
||||
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 3
|
||||
@ -153,7 +147,7 @@ sourceConfig:
|
||||
In this example we want to ingest all databases for which the name starts with `SNOWFLAKE` OR ends with `DB` , then the fillter pattern
|
||||
appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`, `TEST_SNOWFLAKEDB` & `DUMMY_DB`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 3">
|
||||
### Configuring Filters via UI for Example 3
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-3.png"
|
||||
@ -161,9 +155,9 @@ appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in i
|
||||
caption="Database Filter Pattern Example 3"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 3">
|
||||
|
||||
### Configuring Filters via CLI for Example 3
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -176,13 +170,12 @@ sourceConfig:
|
||||
- .*DB$
|
||||
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
#### Example 4
|
||||
|
||||
In this example we want to ingest only the `SNOWFLAKE` database then the fillter pattern appied would be `^SNOWFLAKE$`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 4">
|
||||
### Configuring Filters via UI for Example 4
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-4.png"
|
||||
@ -190,9 +183,9 @@ In this example we want to ingest only the `SNOWFLAKE` database then the fillter
|
||||
caption="Database Filter Pattern Example 4"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 4">
|
||||
|
||||
### Configuring Filters via CLI for Example 4
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -203,8 +196,6 @@ sourceConfig:
|
||||
includes:
|
||||
- ^SNOWFLAKE$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
### Schema Filter Pattern
|
||||
|
||||
@ -242,7 +233,7 @@ In this example we want to ingest all schema winthin any database with name `PUB
|
||||
appied would be `^PUBLIC$` in the include field. This will result in ingestion of schemas `SNOWFLAKE.PUBLIC` & `SNOWFLAKE_SAMPLE_DATA.PUBLIC`
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-1.png"
|
||||
@ -250,9 +241,8 @@ appied would be `^PUBLIC$` in the include field. This will result in ingestion o
|
||||
caption="Schema Filter Pattern Example 1"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -263,7 +253,6 @@ sourceConfig:
|
||||
includes:
|
||||
- ^PUBLIC$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 2
|
||||
@ -274,7 +263,7 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
|
||||
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-2.png"
|
||||
@ -282,9 +271,8 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
|
||||
caption="Schema Filter Pattern Example 2"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -295,7 +283,6 @@ sourceConfig:
|
||||
excludes:
|
||||
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 3
|
||||
@ -303,7 +290,7 @@ sourceConfig:
|
||||
In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWFLAKE_SAMPLE_DATA` that starts with `TPCH_` i.e `SNOWFLAKE_SAMPLE_DATA.TPCH_1`, `SNOWFLAKE_SAMPLE_DATA.TPCH_10` & `SNOWFLAKE_SAMPLE_DATA.TPCH_100`. To achive this an include schema filter will be applied with pattern `^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$` & `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*`, we need to set `useFqnForFiltering` as true as we want to apply filter on FQN.
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 3">
|
||||
### Configuring Filters via UI for Example 3
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-3.png"
|
||||
@ -311,9 +298,8 @@ In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWF
|
||||
caption="Schema Filter Pattern Example 3"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 3">
|
||||
### Configuring Filters via CLI for Example 3
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -325,7 +311,6 @@ sourceConfig:
|
||||
- ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$
|
||||
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
### Table Filter Pattern
|
||||
@ -371,7 +356,7 @@ Snowflake_Prod # Snowflake Service Name
|
||||
|
||||
In this example we want to ingest table with name `CUSTOMER` within any schema and database. In this case we need to apply include table filter pattern `^CUSTOMER$`. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER`, `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-1.png"
|
||||
@ -379,9 +364,9 @@ In this example we want to ingest table with name `CUSTOMER` within any schema a
|
||||
caption="Table Filter Pattern Example 1"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -392,14 +377,12 @@ sourceConfig:
|
||||
includes:
|
||||
- ^CUSTOMER$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 2
|
||||
|
||||
In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` schema of any database. In this case we need to apply include table filter pattern `.*\.PUBLIC\.CUSTOMER$` this will also require to set the `useFqnForFiltering` flag as true as we want to apply filter on FQN. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
{% image
|
||||
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-2.png"
|
||||
@ -407,9 +390,8 @@ In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` sch
|
||||
caption="Table Filter Pattern Example 2"
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -419,5 +401,4 @@ sourceConfig:
|
||||
tableFilterPattern:
|
||||
includes:
|
||||
- .*\.PUBLIC\.CUSTOMER$
|
||||
```
|
||||
</Collapse>
|
||||
```
|
||||
@ -11,7 +11,7 @@ As of now, OpenMetadata uses Airflow under the hood as a scheduler for the Inges
|
||||
This is the right place if you are curious about our current approach or if you are looking forward to contribute by
|
||||
adding the implementation to deploy workflows to another tool directly from the UI.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
Here we are talking about an internal implementation detail. Do not be confused about the information that is going to
|
||||
be shared here vs. the pipeline services supported as connectors for metadata extraction.
|
||||
@ -19,7 +19,7 @@ be shared here vs. the pipeline services supported as connectors for metadata ex
|
||||
For example, we use Airflow as an internal element to deploy and schedule ingestion workflows, but we can also extract
|
||||
metadata from Airflow. Fivetran, for example, is a possible source, but we are not using it to deploy and schedule workflows.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
## Before Reading
|
||||
|
||||
@ -32,7 +32,11 @@ Everything in OpenMetadata is centralized and managed via the API. Then, the Wor
|
||||
via the OpenMetadata server APIs. Morover, the `IngestionPipeline` Entity is also defined in a JSON Schema that you
|
||||
can find [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json).
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png" alt="system context"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png"
|
||||
alt="system context"
|
||||
|
||||
/%}
|
||||
|
||||
Note how OpenMetadata here acts as a middleware, connecting the actions being triggered in the UI to external orchestration
|
||||
systems, which will be the ones managing the heavy lifting of getting a workflow created, scheduled and run. Out of the box,
|
||||
@ -68,29 +72,40 @@ After creating a new workflow from the UI or when editing it, there are two call
|
||||
- `POST` or `PUT` call to update the `Ingestion Pipeline Entity`,
|
||||
- `/deploy` HTTP call to the `IngestionPipelienResource` to trigger the deployment of the new or updated DAG in the Orchestrator.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png" alt="software system"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png"
|
||||
alt="software system"
|
||||
/%}
|
||||
|
||||
|
||||
### Creating the Ingestion Pipeline
|
||||
|
||||
Based on its [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json),
|
||||
there are a few properties about the Ingestion Pipeline we can highlight:
|
||||
|
||||
1. `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
|
||||
2. `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
|
||||
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
|
||||
3. `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
|
||||
4. `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
|
||||
5. `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
|
||||
**1.** `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
|
||||
|
||||
<Note>
|
||||
**2.** `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
|
||||
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
|
||||
|
||||
**3.** `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
|
||||
|
||||
**4.** `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
|
||||
|
||||
**5.** `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
|
||||
|
||||
{% note %}
|
||||
|
||||
While we have yet to update the `airflowConfig` property to be more generic, the only field actually being used is the
|
||||
schedule. You might see this property here, but the whole process can still support other Orchestrators. We will clean
|
||||
this up in future releases.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png" alt="container create"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png"
|
||||
alt="container create"
|
||||
/%}
|
||||
|
||||
Here, the process of creating an Ingestion Pipeline is then the same as with any other Entity.
|
||||
|
||||
@ -104,7 +119,11 @@ The role of OpenMetadata here is just to pass the required communication to the
|
||||
DAG. Basically we need a way to send a call to the Orchestrator that generated a DAG / Workflow object that will be run
|
||||
using the proper functions and classes from the Ingestion Framework.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png" alt="deploy"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png"
|
||||
alt="deploy"
|
||||
/%}
|
||||
|
||||
|
||||
Any Orchestration system that is capable to **DYNAMICALLY** create a workflow based on a given input (that can be obtained
|
||||
from the `IngestionPipeline` Entity information) is a potentially valid candidate to be used as a Pipeline Service.
|
||||
@ -118,8 +137,9 @@ and prepared to contribute a new Pipeline Service Client implementation.
|
||||
In this example I will be deploying an ingestion workflow to get the metadata from a MySQL database. After clicking on the UI
|
||||
to deploy such pipeline, these are the calls that get triggered:
|
||||
|
||||
1. `POST` call to create the `IngestionPipeline` Entity
|
||||
2. `POST` call to deploy the newly created pipeline.
|
||||
**1.** `POST` call to create the `IngestionPipeline` Entity
|
||||
|
||||
**2.** `POST` call to deploy the newly created pipeline.
|
||||
|
||||
## Create the Ingestion Pipeline
|
||||
|
||||
@ -324,10 +344,12 @@ the workflow class depends on our goal: Ingestion, Profiling, Testing...
|
||||
You can follow this logic deeper in the source code of the managed APIs package, but the important thought here is that we
|
||||
need the following logic flow:
|
||||
|
||||
1. An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
|
||||
2. We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
|
||||
**1.** An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
|
||||
|
||||
**2.** We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
|
||||
this something is a `.py` file.
|
||||
3. Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
|
||||
|
||||
**3.** Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
|
||||
APIs package, but if any orchestrator already has an API capable of creating DAGs dynamically, this process can be directly
|
||||
handled in the Pipeline Service Client implementation as all the necessary data is present in the Ingestion Pipeline Entity.
|
||||
|
||||
|
||||
@ -40,8 +40,8 @@ AS SELECT ... FROM schema.table_a JOIN another_schema.table_b;
|
||||
|
||||
From this query we will extract the following information:
|
||||
|
||||
1. There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
|
||||
2. There is a `target` table `schema.my_view`.
|
||||
**1.** There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
|
||||
**2.** There is a `target` table `schema.my_view`.
|
||||
|
||||
In this case we suppose that the database connection requires us to write the table names as `<schema>.<table>`. However,
|
||||
there are other possible options. Sometimes we can find just `<table>` in a query, or even `<database>.<schema>.<table>`.
|
||||
@ -67,13 +67,13 @@ Note that if a Model is not materialized, its data won't be ingested.
|
||||
|
||||
### Query Log
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
Up until 0.11, Query Log analysis for lineage happens during the Usage Workflow.
|
||||
|
||||
From 0.12 onwards, there is a separated Lineage Workflow that will take care of this process.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
#### How to run?
|
||||
|
||||
@ -98,7 +98,7 @@ That being said, this process is the same as the one shown in the View Lineage a
|
||||
parse, we will obtain the `source` and `target` information, use ElasticSearch to identify the Entities in OpenMetadata
|
||||
and then send the lineage to the API.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
When running any query from within OpenMetadata we add an information comment to the query text
|
||||
|
||||
@ -109,7 +109,7 @@ When running any query from within OpenMetadata we add an information comment to
|
||||
Note that queries with this text as well as the ones containing headers from dbt (which follow a similar structure),
|
||||
will be filtered out when building the query log internally.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
#### Troubleshooting
|
||||
|
||||
@ -135,8 +135,11 @@ the data feeding the Dashboards and Charts.
|
||||
|
||||
When ingesting the Dashboards metadata, the workflow will pick up the origin tables (or database, in the case of
|
||||
PowerBI), and prepare the lineage information.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png" alt="Dashboard Lineage"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/lineage/dashboard-ingestion-lineage.png"
|
||||
alt="Dashboard Lineage"
|
||||
caption="Dashboard Lineage"
|
||||
/%}
|
||||
|
||||
## Pipeline Services
|
||||
|
||||
|
||||
@ -9,7 +9,9 @@ The OpenMetadata home screen features a change activity feed that enables you vi
|
||||
- Data for which you are an owner
|
||||
- Data you are following
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/versioning/change-feeds.gif"}
|
||||
alt="Change feeds"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/versioning/change-feeds.gif"
|
||||
alt="Change feeds"
|
||||
/%}
|
||||
|
||||
|
||||
@ -6,7 +6,8 @@ slug: /connectors/ingestion/versioning/event-notification-via-webhooks
|
||||
# Event Notification via Webhooks
|
||||
The webhook interface allows you to build applications that receive all the data changes happening in your organization through APIs. Register URLs to receive metadata event notifications. Slack integration through incoming webhooks is one of many applications of this feature.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"}
|
||||
alt="Event Notification via Webhooks"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/versioning/event-notifications-via-webhooks.gif"
|
||||
alt="Event Notification via Webhooks"
|
||||
/%}
|
||||
|
||||
|
||||
@ -15,7 +15,8 @@ Metadata versioning helps **simplify debugging processes**. View the version his
|
||||
|
||||
Versioning also helps in **broader collaboration** among consumers and producers of data. Admins can provide access to more users in the organization to change certain fields. Crowdsourcing makes metadata the collective responsibility of the entire organization.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/versioning/metadata-versioning.gif"}
|
||||
alt="Metadata versioning"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/versioning/metadata-versioning.gif"
|
||||
alt="Metadata versioning"
|
||||
/%}
|
||||
|
||||
|
||||
@ -53,41 +53,46 @@ Test Cases specify a Test Definition. It will define what condition a test must
|
||||
|
||||
### Step 1: Creating a Test Suite
|
||||
From your table service click on the `profiler` tab. From there you will be able to create table tests by clicking on the purple background `Add Test` top button or column tests by clicking on the white background `Add Test` button.
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"}
|
||||
alt="Write your first test"
|
||||
caption="Write your first test"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/profiler-tab-view.png"
|
||||
alt="Write your first test"
|
||||
caption="Write your first test"
|
||||
/%}
|
||||
|
||||
|
||||
On the next page you will be able to either select an existing Test Suite or Create a new one. If you select an existing one your Test Case will automatically be added to the Test Suite
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"}
|
||||
alt="Create test suite"
|
||||
caption="Create test suite"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-page.png"
|
||||
alt="Create test suite"
|
||||
caption="Create test suite"
|
||||
/%}
|
||||
|
||||
|
||||
### Step 2: Create a Test Case
|
||||
On the next page, you will create a Test Case. You will need to select a Test Definition from the drop down menu and specify the parameters of your Test Case.
|
||||
|
||||
**Note:** Test Case name needs to be unique across the whole platform. A warning message will show if your Test Case name is not unique.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-case-page.png"}
|
||||
alt="Create test case"
|
||||
caption="Create test case"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-case-page.png"
|
||||
alt="Create test case"
|
||||
caption="Create test case"
|
||||
/%}
|
||||
|
||||
|
||||
### Step 3: Add Ingestion Workflow
|
||||
If you have created a new test suite you will see a purple background `Add Ingestion` button after clicking `submit`. This will allow you to schedule the execution of your Test Suite. If you have selected an existing Test Suite you are all set.
|
||||
|
||||
After clicking `Add Ingestion` you will be able to select an execution schedule for your Test Suite (note that you can edit this later). Once you have selected the desired scheduling time, click submit and you are all set.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"}
|
||||
alt="Create ingestion workflow"
|
||||
caption="Create ingestion workflow"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/ingestion-page.png"
|
||||
alt="Create ingestion workflow"
|
||||
caption="Create ingestion workflow"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
## Adding Tests with the YAML Config
|
||||
@ -187,7 +192,7 @@ except ModuleNotFoundError:
|
||||
from airflow.operators.python_operator import PythonOperator
|
||||
|
||||
from metadata.config.common import load_config_file
|
||||
from metadata.data_quality.api.workflow import TestSuiteWorkflow
|
||||
from metadata.test_suite.api.workflow import TestSuiteWorkflow
|
||||
from airflow.utils.dates import days_ago
|
||||
|
||||
default_args = {
|
||||
@ -232,43 +237,53 @@ configurations specified above.
|
||||
## How to Visualize Test Results
|
||||
### From the Test Suite View
|
||||
From the home page click on the Test Suite menu in the left pannel.
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"}
|
||||
alt="Test suite home page"
|
||||
caption="Test suite home page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-home-page.png"
|
||||
alt="Test suite home page"
|
||||
caption="Test suite home page"
|
||||
/%}
|
||||
|
||||
|
||||
This will bring you to the Test Suite page where you can select a specific Test Suite.
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"}
|
||||
alt="Test suite landing page"
|
||||
caption="Test suite landing page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-landing.png"
|
||||
alt="Test suite landing page"
|
||||
caption="Test suite landing page"
|
||||
/%}
|
||||
|
||||
|
||||
From there you can select a Test Suite and visualize the results associated with this specific Test Suite.
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"}
|
||||
alt="Test suite results page"
|
||||
caption="Test suite results page"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-results.png"
|
||||
alt="Test suite results page"
|
||||
caption="Test suite results page"
|
||||
/%}
|
||||
|
||||
|
||||
### From a Table Entity
|
||||
Navigate to your table and click on the `profiler` tab. From there you'll be able to see test results at the table or column level.
|
||||
#### Table Level Test Results
|
||||
In the top pannel, click on the white background `Data Quality` button. This will bring you to a summary of all your quality tests at the table level
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/table-results-entity.png"
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/%}
|
||||
|
||||
|
||||
#### Column Level Test Results
|
||||
On the profiler page, click on a specific column name. This will bring you to a new page where you can click the white background `Quality Test` button to see all the tests results related to your column.
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"}
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/data-quality/colum-level-test-results.png"
|
||||
alt="Test suite results table"
|
||||
caption="Test suite results table"
|
||||
/%}
|
||||
|
||||
|
||||
## Adding Custom Tests
|
||||
While OpenMetadata provides out of the box tests, you may want to write your test results from your own custom quality test suite. This is very easy to do using the API.
|
||||
@ -414,3 +429,4 @@ curl --location --request PUT 'http://localhost:8585/api/v1/testCase/local_redsh
|
||||
|
||||
You will now be able to see your test in the Test Suite or the table entity.
|
||||
|
||||
|
||||
|
||||
@ -18,13 +18,13 @@ The dbt workflow requires the below keys to be present in the node of a manifest
|
||||
- depends_on (required if lineage information needs to exctracted)
|
||||
- columns (required if column description is to be processed)
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
The `name/alias, schema and database` values from dbt manifest.json should match values of the `name, schema and database` of the table/view ingested in OpenMetadata.
|
||||
|
||||
dbt will only be processed if these values match
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
Below is a sample manifest.json node for reference:
|
||||
```json
|
||||
|
||||
@ -32,7 +32,12 @@ Configure the dbt Workflow from the CLI.
|
||||
|
||||
Queries used to create the dbt models can be viewed in the dbt tab
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt-query" caption="dbt Query"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt-query"
|
||||
caption="dbt Query"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. dbt Lineage
|
||||
|
||||
@ -40,7 +45,12 @@ Lineage from dbt models can be viewed in the Lineage tab.
|
||||
|
||||
For more information on how lineage is extracted from dbt take a look [here](/connectors/ingestion/workflows/dbt/ingest-dbt-lineage)
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png" alt="dbt-lineage" caption="dbt Lineage"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-lineage.png"
|
||||
alt="dbt-lineage"
|
||||
caption="dbt Lineage"
|
||||
/%}
|
||||
|
||||
|
||||
### 3. dbt Tags
|
||||
|
||||
@ -48,7 +58,13 @@ Table and column level tags can be imported from dbt
|
||||
|
||||
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-tags) for adding dbt tags
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt-tags" caption="dbt Tags"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
|
||||
alt="dbt-tags"
|
||||
caption="dbt Tags"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
### 4. dbt Owner
|
||||
|
||||
@ -56,7 +72,12 @@ Owner from dbt models can be imported and assigned to respective tables
|
||||
|
||||
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-owner) for adding dbt owner
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png" alt="dbt-owner" caption="dbt Owner"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-owner.png"
|
||||
alt="dbt-owner"
|
||||
caption="dbt Owner"
|
||||
/%}
|
||||
|
||||
|
||||
### 5. dbt Descriptions
|
||||
|
||||
@ -64,13 +85,22 @@ Descriptions from dbt models can be imported and assigned to respective tables a
|
||||
|
||||
By default descriptions from `manifest.json` will be imported. Descriptions from `catalog.json` will only be updated if catalog file is passed.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png" alt="dbt-descriptions" caption="dbt Descriptions"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png"
|
||||
alt="dbt-descriptions"
|
||||
caption="dbt Descriptions"
|
||||
/%}
|
||||
|
||||
|
||||
### 6. dbt Tests and Test Results
|
||||
|
||||
Tests from dbt will only be imported if the `run_results.json` file is passed.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png" alt="dbt-tests" caption="dbt Tests"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-tests.png"
|
||||
alt="dbt-tests"
|
||||
caption="dbt Tests"
|
||||
/%}
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the dbt tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt"
|
||||
caption="dbt"
|
||||
/%}
|
||||
|
||||
|
||||
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
|
||||
|
||||
|
||||
@ -28,7 +28,13 @@ Openmetadata fetches the lineage information from the `manifest.json` file. Belo
|
||||
```
|
||||
|
||||
For the above case the lineage will be created as shown in below:
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png" alt="dbt-lineage-customers" caption="dbt Lineage"/>
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-lineage-customers.png"
|
||||
alt="dbt-lineage-customers"
|
||||
caption="dbt Lineage"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Lineage information from dbt queries
|
||||
Openmetadata fetches the dbt query information from the `manifest.json` file.
|
||||
|
||||
@ -51,47 +51,86 @@ The user or team which will be set as the entity owner should be first created i
|
||||
While linking the owner from `manifest.json` or `catalog.json` files to the entity, OpenMetadata first searches for the user if it is present. If the user is not present it searches for the team
|
||||
|
||||
#### Following steps shows adding a User to OpenMetadata:
|
||||
1. Click on the `Users` section from homepage
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png" alt="click-users-page" caption="Click Users page"/>
|
||||
**1.** Click on the `Users` section from homepage
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png"
|
||||
alt="click-users-page"
|
||||
caption="Click Users page"
|
||||
/%}
|
||||
|
||||
|
||||
**2.** Click on the `Add User` button
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png"
|
||||
alt="click-add-user"
|
||||
caption="Click Add User"
|
||||
/%}
|
||||
|
||||
2. Click on the `Add User` button
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png" alt="click-add-user" caption="Click Add User"/>
|
||||
|
||||
3. Enter the details as shown for the user
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`, you need to enter `openmetadata@youremail.com` in the email id section of add user form as shown below.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png"
|
||||
alt="add-user-dbt"
|
||||
caption="Add User"
|
||||
/%}
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png" alt="add-user-dbt" caption="Add User"/>
|
||||
|
||||
#### Following steps shows adding a Team to OpenMetadata:
|
||||
1. Click on the `Teams` section from homepage
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png" alt="click-teams-page" caption="Click Teams page"/>
|
||||
**1.** Click on the `Teams` section from homepage
|
||||
|
||||
2. Click on the `Add Team` button
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png" alt="click-add-team" caption="Click Add Team"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png"
|
||||
alt="click-teams-page"
|
||||
caption="Click Teams page"
|
||||
/%}
|
||||
|
||||
3. Enter the details as shown for the team
|
||||
**2.** Click on the `Add Team` button
|
||||
|
||||
<Note>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png"
|
||||
alt="click-add-team"
|
||||
caption="Click Add Team"
|
||||
/%}
|
||||
|
||||
|
||||
**3.** Enter the details as shown for the team
|
||||
|
||||
{% note %}
|
||||
|
||||
If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`, you need to enter `openmetadata` in the name section of add team form as shown below.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png"
|
||||
alt="add-team-dbt"
|
||||
caption="Add Team"
|
||||
/%}
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png" alt="add-team-dbt" caption="Add Team"/>
|
||||
|
||||
## Linking the Owner to the table
|
||||
|
||||
After runing the ingestion workflow with dbt you can see the created user or team getting linked to the table as it's owner as it was specified in the `manifest.json` or `catalog.json` file.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png" alt="linked-user" caption="Linked User"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png"
|
||||
alt="linked-user"
|
||||
caption="Linked User"
|
||||
/%}
|
||||
|
||||
<Note>
|
||||
|
||||
|
||||
{% note %}
|
||||
|
||||
If a table already has a owner linked to it, owner from the dbt will not update the current owner.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
@ -11,11 +11,11 @@ Follow the link [here](https://docs.getdbt.com/reference/resource-configs/tags)
|
||||
|
||||
## Requirements
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
For dbt tags, if the tag is not already present it will be created under tag category `DBTTags` in OpenMetadata
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
### 1. Table-Level Tags information in manifest.json file
|
||||
Openmetadata fetches the table-level tags information from the `manifest.json` file. Below is a sample `manifest.json` file node containing tags information under `node_name->tags`.
|
||||
@ -65,4 +65,8 @@ Openmetadata fetches the column-level tags information from the `manifest.json`
|
||||
### 3. Viewing the tags on tables and columns
|
||||
Table and Column level tags ingested from dbt can be viewed on the node in OpenMetadata
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt_tags" caption="dbt tags"/>
|
||||
{% image
|
||||
src="/images/v1.0.0//features/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
|
||||
alt="dbt_tags"
|
||||
caption="dbt tags"
|
||||
/%}
|
||||
|
||||
@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the dbt tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
|
||||
alt="dbt"
|
||||
caption="dbt"
|
||||
/%}
|
||||
|
||||
|
||||
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -20,7 +25,12 @@ We can create a workflow that will obtain the dbt information from the dbt files
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/add-ingestion.png" alt="add-ingestion" caption="Add dbt Ingestion"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add dbt Ingestion"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the dbt Ingestion
|
||||
|
||||
@ -29,11 +39,11 @@ Here you can enter the dbt Ingestion details:
|
||||
|
||||
dbt sources for manifest.json, catalog.json and run_results.json files can be configured as shown in the UI below. The dbt files are needed to be stored on one of these sources.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
Only the `manifest.json` file is compulsory for dbt ingestion.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
|
||||
#### AWS S3 Buckets
|
||||
@ -42,7 +52,12 @@ OpenMetadata connects to the AWS s3 bucket via the credentials provided and scan
|
||||
|
||||
The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/aws-s3.png" alt="aws-s3-bucket" caption="AWS S3 Bucket Config"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/aws-s3.png"
|
||||
alt="aws-s3-bucket"
|
||||
caption="AWS S3 Bucket Config"
|
||||
/%}
|
||||
|
||||
|
||||
#### Google Cloud Storage Buckets
|
||||
|
||||
@ -51,13 +66,23 @@ OpenMetadata connects to the GCS bucket via the credentials provided and scans t
|
||||
The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
|
||||
|
||||
GCS credentials can be stored in two ways:
|
||||
1. Entering the credentials directly into the form
|
||||
**1.** Entering the credentials directly into the form
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png" alt="gcs-storage-bucket-form" caption="GCS Bucket config"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/gcs-bucket-form.png"
|
||||
alt="gcs-storage-bucket-form"
|
||||
caption="GCS Bucket config"
|
||||
/%}
|
||||
|
||||
2. Entering the path of file in which the GCS bucket credentials are stored.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png" alt="gcs-storage-bucket-path" caption="GCS Bucket Path Config"/>
|
||||
**2.** Entering the path of file in which the GCS bucket credentials are stored.
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/gcs-bucket-path.png"
|
||||
alt="gcs-storage-bucket-path"
|
||||
caption="GCS Bucket Path Config"
|
||||
/%}
|
||||
|
||||
|
||||
For more information on Google Cloud Storage authentication click [here](https://cloud.google.com/docs/authentication/getting-started#create-service-account-console).
|
||||
|
||||
@ -65,13 +90,22 @@ For more information on Google Cloud Storage authentication click [here](https:/
|
||||
|
||||
Path of the `manifest.json`, `catalog.json` and `run_results.json` files stored in the local system or in the container in which openmetadata server is running can be directly provided.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/local-storage.png" alt="local-storage" caption="Local Storage Config"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/local-storage.png"
|
||||
alt="local-storage"
|
||||
caption="Local Storage Config"
|
||||
/%}
|
||||
|
||||
#### File Server
|
||||
|
||||
File server path of the `manifest.json`, `catalog.json` and `run_results.json` files stored on a file server directly provided.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/file_server.png" alt="file-server" caption="File Server Config"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/file_server.png"
|
||||
alt="file-server"
|
||||
caption="File Server Config"
|
||||
/%}
|
||||
|
||||
|
||||
#### dbt Cloud
|
||||
|
||||
@ -79,9 +113,18 @@ Click on the the link [here](https://docs.getdbt.com/guides/getting-started) for
|
||||
OpenMetadata uses dbt cloud APIs to fetch the `run artifacts` (manifest.json, catalog.json and run_results.json) from the most recent dbt run.
|
||||
The APIs need to be authenticated using an Authentication Token. Follow the link [here](https://docs.getdbt.com/dbt-cloud/api-v2#section/Authentication) to generate an authentication token for your dbt cloud account.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-cloud.png" alt="dbt-cloud" caption="dbt Cloud config"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-cloud.png"
|
||||
alt="dbt-cloud"
|
||||
caption="dbt Cloud config"
|
||||
/%}
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png" alt="schedule-and-deploy" caption="Schedule dbt ingestion pipeline"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/dbt/schedule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="Schedule dbt ingestion pipeline"
|
||||
/%}
|
||||
|
||||
@ -38,7 +38,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the Lineage tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/lineage/table-entity-page.png"
|
||||
alt="table-entity-page"
|
||||
caption="Table Entity Page"
|
||||
/%}
|
||||
|
||||
|
||||
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -46,15 +51,23 @@ We can create a workflow that will obtain the query log and table creation infor
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/lineage/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add Ingestion"
|
||||
/%}
|
||||
|
||||
### 2. Configure the Lineage Ingestion
|
||||
|
||||
Here you can enter the Lineage Ingestion details:
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/lineage/configure-lineage-ingestion.png"
|
||||
alt="configure-lineage-ingestion"
|
||||
caption="Configure the Lineage Ingestion"
|
||||
/%}
|
||||
|
||||
<Collapse title="Lineage Options">
|
||||
### Lineage Options
|
||||
|
||||
**Query Log Duration**
|
||||
|
||||
@ -63,10 +76,15 @@ Specify the duration in days for which the profiler should capture lineage data
|
||||
**Result Limit**
|
||||
|
||||
Set the limit for the query log results to be run at a time.
|
||||
</Collapse>
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/lineage/scheule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="View Service Ingestion pipelines"
|
||||
/%}
|
||||
|
||||
|
||||
@ -26,11 +26,11 @@ you to execute the lineage workflow using a query log file. This can be arbitrar
|
||||
|
||||
A query log file is a standard CSV file which contains the following information.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
A standard CSV should be comma separated, and each row represented as a single line in the file.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
|
||||
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
|
||||
|
||||
@ -12,36 +12,40 @@ After the metadata ingestion has been done correctly, we can configure and deplo
|
||||
|
||||
This Pipeline will be in charge of feeding the Profiler tab of the Table Entity, as well as running any tests configured in the Entity.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"}
|
||||
alt="Table profile summary page"
|
||||
caption="Table profile summary page"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-summary-table.png"
|
||||
alt="Table profile summary page"
|
||||
caption="Table profile summary page"
|
||||
/%}
|
||||
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-summary-colomn.png"
|
||||
alt="Column profile summary page"
|
||||
caption="Column profile summary page"
|
||||
/%}
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"}
|
||||
alt="Column profile summary page"
|
||||
caption="Column profile summary page"
|
||||
/>
|
||||
|
||||
|
||||
### 1. Add a Profiler Ingestion
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Profiler Ingestion.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"}
|
||||
alt="Add a profiler service"
|
||||
caption="Add a profiler service"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/add-profiler-workflow.png"
|
||||
alt="Add a profiler service"
|
||||
caption="Add a profiler service"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the Profiler Ingestion
|
||||
Here you can enter the Profiler Ingestion details.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"}
|
||||
alt="Set profiler configuration"
|
||||
caption="Set profiler configuration"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/configure-profiler-workflow.png"
|
||||
alt="Set profiler configuration"
|
||||
caption="Set profiler configuration"
|
||||
/%}
|
||||
|
||||
|
||||
#### Profiler Options
|
||||
**Name**
|
||||
@ -84,17 +88,18 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
|
||||
### 4. Updating Profiler setting at the table level
|
||||
Once you have created your profiler you can adjust some behavior at the table level by going to the table and clicking on the profiler tab
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"}
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/accessing-table-profile-settings.png"
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/%}
|
||||
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/table-profile-summary-view.png"
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/%}
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"}
|
||||
alt="table profile settings"
|
||||
caption="table profile settings"
|
||||
/>
|
||||
|
||||
#### Profiler Options
|
||||
**Profile Sample**
|
||||
|
||||
@ -14,11 +14,11 @@ A Metric is a computation that we can run on top of a Table or Column to receive
|
||||
|
||||
On this page, you will learn all the metrics that we currently support and their meaning. We will base all the namings on the definitions on the JSON Schemas.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
You can check the definition of the `columnProfile` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L271). On the other hand, the metrics are implemented [here](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/orm\_profiler/metrics).
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
We will base all the namings on the definitions on the JSON Schemas.
|
||||
|
||||
@ -37,11 +37,12 @@ Returns the number of columns in the Table.
|
||||
## System Metrics
|
||||
System metrics provide information related to DML operations performed on the table. These metrics present a concise view of your data freshness. In a typical data processing flow tables are updated at a certain frequency. Table freshness will be monitored by confirming a set of operations has been performed against the table. To increase trust in your data assets, OpenMetadata will monitor the `INSERT`, `UPDATE` and `DELETE` operations performed against your table to showcase 2 metrics related to freshness (see below for more details). With this information, you are able to see when a specific operation was last perform and how many rows it affected.
|
||||
|
||||
<Image
|
||||
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"}
|
||||
alt="table profile freshness metrics"
|
||||
caption="table profile freshness metrics"
|
||||
/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-freshness-metrics.png"
|
||||
alt="table profile freshness metrics"
|
||||
caption="table profile freshness metrics"
|
||||
/%}
|
||||
|
||||
|
||||
These metrics are available for **BigQuery**, **Redshift** and **Snowflake**. Other database engines are currently not supported so the computation of the system metrics will be skipped.
|
||||
|
||||
|
||||
@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
|
||||
|
||||
This will populate the Queries tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/usage/table-entity-page.png"
|
||||
alt="table-entity-page"
|
||||
caption="Table Entity Page"
|
||||
/%}
|
||||
|
||||
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Usage Ingestion will be in charge of obtaining this data.
|
||||
|
||||
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Usage Ingestion.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/usage/add-ingestion.png"
|
||||
alt="add-ingestion"
|
||||
caption="Add Ingestion"
|
||||
/%}
|
||||
|
||||
|
||||
### 2. Configure the Usage Ingestion
|
||||
|
||||
Here you can enter the Usage Ingestion details:
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png" alt="configure-usage-ingestion" caption="Configure the Usage Ingestion"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/usage/configure-usage-ingestion.png"
|
||||
alt="configure-usage-ingestion"
|
||||
caption="Configure the Usage Ingestion"
|
||||
/%}
|
||||
|
||||
<Collapse title="Usage Options">
|
||||
|
||||
### Usage Options
|
||||
|
||||
**Query Log Duration**
|
||||
|
||||
@ -67,10 +81,16 @@ Mention the absolute file path of the temporary file name to store the query log
|
||||
**Result Limit**
|
||||
|
||||
Set the limit for the query log results to be run at a time.
|
||||
</Collapse>
|
||||
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/usage/scheule-and-deploy.png"
|
||||
alt="schedule-and-deploy"
|
||||
caption="View Service Ingestion pipelines"
|
||||
/%}
|
||||
|
||||
|
||||
|
||||
@ -26,11 +26,11 @@ you to execute the lineage workflow using a query log file. This can be arbitrar
|
||||
|
||||
A query log file is a standard CSV file which contains the following information.
|
||||
|
||||
<Note>
|
||||
{% note %}
|
||||
|
||||
A standard CSV should be comma separated, and each row represented as a single line in the file.
|
||||
|
||||
</Note>
|
||||
{% /note %}
|
||||
|
||||
- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
|
||||
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes
|
||||
|
||||
@ -12,20 +12,19 @@ One can configure the metadata ingestion filter for database source using four c
|
||||
`Schema Filter Pattern`, `Table Filter Pattern` & `Use FQN For Filtering`. In this documnet we will learn about each field in detail
|
||||
along with many examples.
|
||||
|
||||
<Collapse title="Configuring Filters via UI">
|
||||
### Configuring Filters via UI
|
||||
|
||||
Filters can be configured in UI while adding an ingestion pipeline through `Add Metadata Ingestion` page.
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-patterns.png"
|
||||
alt="Database Filter Pattern Fields"
|
||||
caption="Database Filter Pattern Fields"
|
||||
/>
|
||||
|
||||
</Collapse>
|
||||
/%}
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via CLI">
|
||||
|
||||
### Configuring Filters via CLI
|
||||
|
||||
Filters can be configured in CLI in connection configuration within `source.sourceConfig.config` field as described below.
|
||||
|
||||
@ -57,7 +56,7 @@ sourceConfig:
|
||||
- table4
|
||||
```
|
||||
|
||||
</Collapse>
|
||||
|
||||
|
||||
### Use FQN For Filtering
|
||||
|
||||
@ -93,17 +92,17 @@ In this example we want to ingest all databases which contains `SNOWFLAKE` in na
|
||||
appied would be `.*SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`
|
||||
and `TEST_SNOWFLAKEDB`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-1.png"
|
||||
alt="Database Filter Pattern Example 1"
|
||||
caption="Database Filter Pattern Example 1"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -116,24 +115,22 @@ sourceConfig:
|
||||
|
||||
```
|
||||
|
||||
</Collapse>
|
||||
|
||||
#### Example 2
|
||||
|
||||
In this example we want to ingest all databases which starts with `SNOWFLAKE` in name, then the fillter pattern
|
||||
appied would be `^SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE` & `SNOWFLAKE_SAMPLE_DATA`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-2.png"
|
||||
alt="Database Filter Pattern Example 2"
|
||||
caption="Database Filter Pattern Example 2"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -145,7 +142,6 @@ sourceConfig:
|
||||
- ^SNOWFLAKE.*
|
||||
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 3
|
||||
@ -153,17 +149,16 @@ sourceConfig:
|
||||
In this example we want to ingest all databases for which the name starts with `SNOWFLAKE` OR ends with `DB` , then the fillter pattern
|
||||
appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`, `TEST_SNOWFLAKEDB` & `DUMMY_DB`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 3">
|
||||
### Configuring Filters via UI for Example 3
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-3.png"
|
||||
alt="Database Filter Pattern Example 3"
|
||||
caption="Database Filter Pattern Example 3"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 3">
|
||||
### Configuring Filters via CLI for Example 3
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -176,23 +171,23 @@ sourceConfig:
|
||||
- .*DB$
|
||||
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 4
|
||||
|
||||
In this example we want to ingest only the `SNOWFLAKE` database then the fillter pattern appied would be `^SNOWFLAKE$`.
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 4">
|
||||
### Configuring Filters via UI for Example 4
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-4.png"
|
||||
alt="Database Filter Pattern Example 4"
|
||||
caption="Database Filter Pattern Example 4"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 4">
|
||||
|
||||
### Configuring Filters via CLI for Example 4
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -203,7 +198,6 @@ sourceConfig:
|
||||
includes:
|
||||
- ^SNOWFLAKE$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
### Schema Filter Pattern
|
||||
@ -242,17 +236,15 @@ In this example we want to ingest all schema winthin any database with name `PUB
|
||||
appied would be `^PUBLIC$` in the include field. This will result in ingestion of schemas `SNOWFLAKE.PUBLIC` & `SNOWFLAKE_SAMPLE_DATA.PUBLIC`
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-1.png"
|
||||
alt="Schema Filter Pattern Example 1"
|
||||
caption="Schema Filter Pattern Example 1"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -263,7 +255,7 @@ sourceConfig:
|
||||
includes:
|
||||
- ^PUBLIC$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
|
||||
#### Example 2
|
||||
@ -274,17 +266,17 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
|
||||
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-2.png"
|
||||
alt="Schema Filter Pattern Example 2"
|
||||
caption="Schema Filter Pattern Example 2"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -295,7 +287,6 @@ sourceConfig:
|
||||
excludes:
|
||||
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 3
|
||||
@ -303,17 +294,18 @@ sourceConfig:
|
||||
In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWFLAKE_SAMPLE_DATA` that starts with `TPCH_` i.e `SNOWFLAKE_SAMPLE_DATA.TPCH_1`, `SNOWFLAKE_SAMPLE_DATA.TPCH_10` & `SNOWFLAKE_SAMPLE_DATA.TPCH_100`. To achive this an include schema filter will be applied with pattern `^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$` & `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*`, we need to set `useFqnForFiltering` as true as we want to apply filter on FQN.
|
||||
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 3">
|
||||
### Configuring Filters via UI for Example 3
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-3.png"
|
||||
alt="Schema Filter Pattern Example 3"
|
||||
caption="Schema Filter Pattern Example 3"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 3">
|
||||
|
||||
|
||||
### Configuring Filters via CLI for Example 3
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -325,7 +317,7 @@ sourceConfig:
|
||||
- ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$
|
||||
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
|
||||
### Table Filter Pattern
|
||||
@ -371,17 +363,17 @@ Snowflake_Prod # Snowflake Service Name
|
||||
|
||||
In this example we want to ingest table with name `CUSTOMER` within any schema and database. In this case we need to apply include table filter pattern `^CUSTOMER$`. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER`, `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 1">
|
||||
### Configuring Filters via UI for Example 1
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/table-filter-example-1.png"
|
||||
alt="Table Filter Pattern Example 1"
|
||||
caption="Table Filter Pattern Example 1"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 1">
|
||||
|
||||
### Configuring Filters via CLI for Example 1
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -392,24 +384,23 @@ sourceConfig:
|
||||
includes:
|
||||
- ^CUSTOMER$
|
||||
```
|
||||
</Collapse>
|
||||
|
||||
|
||||
#### Example 2
|
||||
|
||||
In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` schema of any database. In this case we need to apply include table filter pattern `.*\.PUBLIC\.CUSTOMER$` this will also require to set the `useFqnForFiltering` flag as true as we want to apply filter on FQN. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
|
||||
|
||||
<Collapse title="Configuring Filters via UI for Example 2">
|
||||
### Configuring Filters via UI for Example 2
|
||||
|
||||
<Image
|
||||
{% image
|
||||
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/table-filter-example-2.png"
|
||||
alt="Table Filter Pattern Example 2"
|
||||
caption="Table Filter Pattern Example 2"
|
||||
/>
|
||||
/%}
|
||||
|
||||
</Collapse>
|
||||
|
||||
<Collapse title="Configuring Filters via CLI for Example 2">
|
||||
|
||||
### Configuring Filters via CLI for Example 2
|
||||
|
||||
```yaml
|
||||
sourceConfig:
|
||||
@ -420,4 +411,3 @@ sourceConfig:
|
||||
includes:
|
||||
- .*\.PUBLIC\.CUSTOMER$
|
||||
```
|
||||
</Collapse>
|
||||
@ -12,15 +12,15 @@ filter out the log tables while ingesting metadata.
|
||||
Configuring these metadata filters with OpenMetadata is very easy, which uses regex for matching and filtering the metadata.
|
||||
Following documents will guide you on how to configure filters based on the type of data source
|
||||
|
||||
<InlineCalloutContainer>
|
||||
<InlineCallout
|
||||
color="violet-70"
|
||||
{%inlineCalloutContainer%}
|
||||
|
||||
{%inlineCallout
|
||||
bold="Database Filter Patterns"
|
||||
icon="cable"
|
||||
href="/connectors/ingestion/workflows/metadata/filter-patterns/database"
|
||||
>
|
||||
Learn more about how to configure filters for database sources.
|
||||
</InlineCallout>
|
||||
</InlineCalloutContainer>
|
||||
href="/connectors/ingestion/workflows/metadata/filter-patterns/database" %}
|
||||
Learn more about how to configure filters for database sources.
|
||||
{%/inlineCallout%}
|
||||
|
||||
{%/inlineCalloutContainer%}
|
||||
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user