Update ingestion section (#11265)

This commit is contained in:
Milan Bariya 2023-04-25 18:16:08 +05:30 committed by GitHub
parent 86b34ed151
commit f1d57caf97
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
38 changed files with 846 additions and 452 deletions

View File

@ -32,7 +32,10 @@ Everything in OpenMetadata is centralized and managed via the API. Then, the Wor
via the OpenMetadata server APIs. Morover, the `IngestionPipeline` Entity is also defined in a JSON Schema that you
can find [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json).
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png" alt="system context"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png"
alt="system context"
/%}
Note how OpenMetadata here acts as a middleware, connecting the actions being triggered in the UI to external orchestration
systems, which will be the ones managing the heavy lifting of getting a workflow created, scheduled and run. Out of the box,
@ -68,19 +71,27 @@ After creating a new workflow from the UI or when editing it, there are two call
- `POST` or `PUT` call to update the `Ingestion Pipeline Entity`,
- `/deploy` HTTP call to the `IngestionPipelienResource` to trigger the deployment of the new or updated DAG in the Orchestrator.
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png" alt="software system"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png"
alt="software system"
/%}
### Creating the Ingestion Pipeline
Based on its [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json),
there are a few properties about the Ingestion Pipeline we can highlight:
1. `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
2. `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
**1.** `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
**2.** `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
3. `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
4. `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
5. `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
**3.** `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
**4.** `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
**5.** `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
{% note %}
@ -89,8 +100,10 @@ schedule. You might see this property here, but the whole process can still supp
this up in future releases.
{% /note %}
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png" alt="container create"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png"
alt="container create"
/%}
Here, the process of creating an Ingestion Pipeline is then the same as with any other Entity.
@ -104,7 +117,10 @@ The role of OpenMetadata here is just to pass the required communication to the
DAG. Basically we need a way to send a call to the Orchestrator that generated a DAG / Workflow object that will be run
using the proper functions and classes from the Ingestion Framework.
<Image src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png" alt="deploy"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png"
alt="deploy"
/%}
Any Orchestration system that is capable to **DYNAMICALLY** create a workflow based on a given input (that can be obtained
from the `IngestionPipeline` Entity information) is a potentially valid candidate to be used as a Pipeline Service.
@ -118,8 +134,9 @@ and prepared to contribute a new Pipeline Service Client implementation.
In this example I will be deploying an ingestion workflow to get the metadata from a MySQL database. After clicking on the UI
to deploy such pipeline, these are the calls that get triggered:
1. `POST` call to create the `IngestionPipeline` Entity
2. `POST` call to deploy the newly created pipeline.
**1.** `POST` call to create the `IngestionPipeline` Entity
**2.** `POST` call to deploy the newly created pipeline.
## Create the Ingestion Pipeline
@ -324,10 +341,12 @@ the workflow class depends on our goal: Ingestion, Profiling, Testing...
You can follow this logic deeper in the source code of the managed APIs package, but the important thought here is that we
need the following logic flow:
1. An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
2. We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
**1.** An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
**2.** We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
this something is a `.py` file.
3. Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
**3.** Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
APIs package, but if any orchestrator already has an API capable of creating DAGs dynamically, this process can be directly
handled in the Pipeline Service Client implementation as all the necessary data is present in the Ingestion Pipeline Entity.

View File

@ -40,8 +40,8 @@ AS SELECT ... FROM schema.table_a JOIN another_schema.table_b;
From this query we will extract the following information:
1. There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
2. There is a `target` table `schema.my_view`.
**1.** There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
**2.** There is a `target` table `schema.my_view`.
In this case we suppose that the database connection requires us to write the table names as `<schema>.<table>`. However,
there are other possible options. Sometimes we can find just `<table>` in a query, or even `<database>.<schema>.<table>`.
@ -135,8 +135,11 @@ the data feeding the Dashboards and Charts.
When ingesting the Dashboards metadata, the workflow will pick up the origin tables (or database, in the case of
PowerBI), and prepare the lineage information.
<Image src="/images/v0.13.2/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png" alt="Dashboard Lineage"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png"
alt="Dashboard Lineage"
caption="Dashboard Lineage"
/%}
## Pipeline Services

View File

@ -9,7 +9,8 @@ The OpenMetadata home screen features a change activity feed that enables you vi
- Data for which you are an owner
- Data you are following
<Image
src={"/images/v0.13.2/openmetadata/ingestion/versioning/change-feeds.gif"}
alt="Change feeds"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/versioning/change-feeds.gif"
alt="Change feeds"
/%}

View File

@ -6,7 +6,8 @@ slug: /connectors/ingestion/versioning/event-notification-via-webhooks
# Event Notification via Webhooks
The webhook interface allows you to build applications that receive all the data changes happening in your organization through APIs. Register URLs to receive metadata event notifications. Slack integration through incoming webhooks is one of many applications of this feature.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"}
alt="Event Notification via Webhooks"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"
alt="Event Notification via Webhooks"
/%}

View File

@ -15,7 +15,9 @@ Metadata versioning helps **simplify debugging processes**. View the version his
Versioning also helps in **broader collaboration** among consumers and producers of data. Admins can provide access to more users in the organization to change certain fields. Crowdsourcing makes metadata the collective responsibility of the entire organization.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/versioning/metadata-versioning.gif"}
alt="Metadata versioning"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/versioning/metadata-versioning.gif"
alt="Metadata versioning"
caption="Dashboard Lineage"
/%}

View File

@ -53,41 +53,44 @@ Test Cases specify a Test Definition. It will define what condition a test must
### Step 1: Creating a Test Suite
From your table service click on the `profiler` tab. From there you will be able to create table tests by clicking on the purple background `Add Test` top button or column tests by clicking on the white background `Add Test` button.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"}
alt="Write your first test"
caption="Write your first test"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"
alt="Write your first test"
caption="Write your first test"
/%}
On the next page you will be able to either select an existing Test Suite or Create a new one. If you select an existing one your Test Case will automatically be added to the Test Suite
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"}
alt="Create test suite"
caption="Create test suite"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"
alt="Create test suite"
caption="Create test suite"
/%}
### Step 2: Create a Test Case
On the next page, you will create a Test Case. You will need to select a Test Definition from the drop down menu and specify the parameters of your Test Case.
**Note:** Test Case name needs to be unique across the whole platform. A warning message will show if your Test Case name is not unique.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-case-page.png"}
alt="Create test case"
caption="Create test case"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-case-page.png"
alt="Create test case"
caption="Create test case"
/%}
### Step 3: Add Ingestion Workflow
If you have created a new test suite you will see a purple background `Add Ingestion` button after clicking `submit`. This will allow you to schedule the execution of your Test Suite. If you have selected an existing Test Suite you are all set.
After clicking `Add Ingestion` you will be able to select an execution schedule for your Test Suite (note that you can edit this later). Once you have selected the desired scheduling time, click submit and you are all set.
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"
alt="Create ingestion workflow"
caption="Create ingestion workflow"
/%}
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"}
alt="Create ingestion workflow"
caption="Create ingestion workflow"
/>
## Adding Tests with the YAML Config
@ -187,7 +190,7 @@ except ModuleNotFoundError:
from airflow.operators.python_operator import PythonOperator
from metadata.config.common import load_config_file
from metadata.test_suite.api.workflow import TestSuiteWorkflow
from metadata.data_quality.api.workflow import TestSuiteWorkflow
from airflow.utils.dates import days_ago
default_args = {
@ -232,43 +235,54 @@ configurations specified above.
## How to Visualize Test Results
### From the Test Suite View
From the home page click on the Test Suite menu in the left pannel.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"}
alt="Test suite home page"
caption="Test suite home page"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"
alt="Test suite home page"
caption="Test suite home page"
/%}
This will bring you to the Test Suite page where you can select a specific Test Suite.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"}
alt="Test suite landing page"
caption="Test suite landing page"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"
alt="Test suite landing page"
caption="Test suite landing page"
/%}
From there you can select a Test Suite and visualize the results associated with this specific Test Suite.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"}
alt="Test suite results page"
caption="Test suite results page"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"
alt="Test suite results page"
caption="Test suite results page"
/%}
### From a Table Entity
Navigate to your table and click on the `profiler` tab. From there you'll be able to see test results at the table or column level.
#### Table Level Test Results
In the top pannel, click on the white background `Data Quality` button. This will bring you to a summary of all your quality tests at the table level
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"}
alt="Test suite results table"
caption="Test suite results table"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"
alt="Test suite results table"
caption="Test suite results table"
/%}
#### Column Level Test Results
On the profiler page, click on a specific column name. This will bring you to a new page where you can click the white background `Quality Test` button to see all the tests results related to your column.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"}
alt="Test suite results table"
caption="Test suite results table"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"
alt="Test suite results table"
caption="Test suite results table"
/%}
## Adding Custom Tests
While OpenMetadata provides out of the box tests, you may want to write your test results from your own custom quality test suite. This is very easy to do using the API.

View File

@ -32,7 +32,12 @@ Configure the dbt Workflow from the CLI.
Queries used to create the dbt models can be viewed in the dbt tab
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt-query" caption="dbt Query"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt-query"
caption="dbt Query"
/%}
### 2. dbt Lineage
@ -40,7 +45,12 @@ Lineage from dbt models can be viewed in the Lineage tab.
For more information on how lineage is extracted from dbt take a look [here](/connectors/ingestion/workflows/dbt/ingest-dbt-lineage)
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png" alt="dbt-lineage" caption="dbt Lineage"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png"
alt="dbt-lineage"
caption="dbt Lineage"
/%}
### 3. dbt Tags
@ -48,7 +58,12 @@ Table and column level tags can be imported from dbt
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-tags) for adding dbt tags
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt-tags" caption="dbt Tags"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
alt="dbt-tags"
caption="dbt Tags"
/%}
### 4. dbt Owner
@ -56,7 +71,11 @@ Owner from dbt models can be imported and assigned to respective tables
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-owner) for adding dbt owner
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png" alt="dbt-owner" caption="dbt Owner"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png"
alt="dbt-owner"
caption="dbt Owner"
/%}
### 5. dbt Descriptions
@ -64,13 +83,22 @@ Descriptions from dbt models can be imported and assigned to respective tables a
By default descriptions from `manifest.json` will be imported. Descriptions from `catalog.json` will only be updated if catalog file is passed.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png" alt="dbt-descriptions" caption="dbt Descriptions"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png"
alt="dbt-descriptions"
caption="dbt Descriptions"
/%}
### 6. dbt Tests and Test Results
Tests from dbt will only be imported if the `run_results.json` file is passed.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png" alt="dbt-tests" caption="dbt Tests"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png"
alt="dbt-tests"
caption="dbt Tests"
/%}
## Troubleshooting

View File

@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the dbt tab from the Table Entity Page.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt"
caption="dbt"
/%}
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.

View File

@ -28,7 +28,11 @@ Openmetadata fetches the lineage information from the `manifest.json` file. Belo
```
For the above case the lineage will be created as shown in below:
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png" alt="dbt-lineage-customers" caption="dbt Lineage"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png"
alt="dbt-lineage-customers"
caption="dbt Lineage"
/%}
### 2. Lineage information from dbt queries
Openmetadata fetches the dbt query information from the `manifest.json` file.

View File

@ -51,11 +51,23 @@ The user or team which will be set as the entity owner should be first created i
While linking the owner from `manifest.json` or `catalog.json` files to the entity, OpenMetadata first searches for the user if it is present. If the user is not present it searches for the team
#### Following steps shows adding a User to OpenMetadata:
1. Click on the `Users` section from homepage
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png" alt="click-users-page" caption="Click Users page"/>
**1.** Click on the `Users` section from homepage
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png"
alt="click-users-page"
caption="Click Users page"
/%}
**2.** Click on the `Add User` button
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png"
alt="click-add-user"
caption="Click Add User"
/%}
2. Click on the `Add User` button
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png" alt="click-add-user" caption="Click Add User"/>
3. Enter the details as shown for the user
@ -65,16 +77,34 @@ If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`,
{% /note %}
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png" alt="add-user-dbt" caption="Add User"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png"
alt="add-user-dbt"
caption="Add User"
/%}
#### Following steps shows adding a Team to OpenMetadata:
1. Click on the `Teams` section from homepage
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png" alt="click-teams-page" caption="Click Teams page"/>
**1.** Click on the `Teams` section from homepage
2. Click on the `Add Team` button
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png" alt="click-add-team" caption="Click Add Team"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png"
alt="click-teams-page"
caption="Click Teams page"
/%}
3. Enter the details as shown for the team
**2.** Click on the `Add Team` button
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png"
alt="click-add-team"
caption="Click Add Team"
/%}
**3.** Enter the details as shown for the team
{% note %}
@ -82,13 +112,22 @@ If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`,
{% /note %}
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png" alt="add-team-dbt" caption="Add Team"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png"
alt="add-team-dbt"
caption="Add Team"
/%}
## Linking the Owner to the table
After runing the ingestion workflow with dbt you can see the created user or team getting linked to the table as it's owner as it was specified in the `manifest.json` or `catalog.json` file.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png" alt="linked-user" caption="Linked User"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png"
alt="linked-user"
caption="Linked User"
/%}
{% note %}

View File

@ -65,4 +65,8 @@ Openmetadata fetches the column-level tags information from the `manifest.json`
### 3. Viewing the tags on tables and columns
Table and Column level tags ingested from dbt can be viewed on the node in OpenMetadata
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt_tags" caption="dbt tags"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
alt="dbt_tags"
caption="dbt tags"
/%}

View File

@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the dbt tab from the Table Entity Page.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt"
caption="dbt"
/%}
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
@ -20,7 +25,12 @@ We can create a workflow that will obtain the dbt information from the dbt files
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/add-ingestion.png" alt="add-ingestion" caption="Add dbt Ingestion"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/add-ingestion.png"
alt="add-ingestion"
caption="Add dbt Ingestion"
/%}
### 2. Configure the dbt Ingestion
@ -42,7 +52,12 @@ OpenMetadata connects to the AWS s3 bucket via the credentials provided and scan
The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/aws-s3.png" alt="aws-s3-bucket" caption="AWS S3 Bucket Config"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/aws-s3.png"
alt="aws-s3-bucket"
caption="AWS S3 Bucket Config"
/%}
#### Google Cloud Storage Buckets
@ -51,13 +66,22 @@ OpenMetadata connects to the GCS bucket via the credentials provided and scans t
The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
GCS credentials can be stored in two ways:
1. Entering the credentials directly into the form
**1.** Entering the credentials directly into the form
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png" alt="gcs-storage-bucket-form" caption="GCS Bucket config"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png"
alt="gcs-storage-bucket-form"
caption="GCS Bucket config"
/%}
2. Entering the path of file in which the GCS bucket credentials are stored.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png" alt="gcs-storage-bucket-path" caption="GCS Bucket Path Config"/>
**2.** Entering the path of file in which the GCS bucket credentials are stored.
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png"
alt="gcs-storage-bucket-path"
caption="GCS Bucket Path Config"
/%}
For more information on Google Cloud Storage authentication click [here](https://cloud.google.com/docs/authentication/getting-started#create-service-account-console).
@ -65,13 +89,23 @@ For more information on Google Cloud Storage authentication click [here](https:/
Path of the `manifest.json`, `catalog.json` and `run_results.json` files stored in the local system or in the container in which openmetadata server is running can be directly provided.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/local-storage.png" alt="local-storage" caption="Local Storage Config"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/local-storage.png"
alt="local-storage"
caption="Local Storage Config"
/%}
#### File Server
File server path of the `manifest.json`, `catalog.json` and `run_results.json` files stored on a file server directly provided.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/file_server.png" alt="file-server" caption="File Server Config"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/file_server.png"
alt="file-server"
caption="File Server Config"
/%}
#### dbt Cloud
@ -79,9 +113,19 @@ Click on the the link [here](https://docs.getdbt.com/guides/getting-started) for
OpenMetadata uses dbt cloud APIs to fetch the `run artifacts` (manifest.json, catalog.json and run_results.json) from the most recent dbt run.
The APIs need to be authenticated using an Authentication Token. Follow the link [here](https://docs.getdbt.com/dbt-cloud/api-v2#section/Authentication) to generate an authentication token for your dbt cloud account.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-cloud.png" alt="dbt-cloud" caption="dbt Cloud config"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/dbt-cloud.png"
alt="dbt-cloud"
caption="dbt Cloud config"
/%}
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png" alt="schedule-and-deploy" caption="Schedule dbt ingestion pipeline"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png"
alt="schedule-and-deploy"
caption="Schedule dbt ingestion pipeline"
/%}

View File

@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the Lineage tab from the Table Entity Page.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/table-entity-page.png"
alt="table-entity-page"
caption="Table Entity Page"
/%}
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/add-ingestion.png"
alt="add-ingestion"
caption="Add Ingestion"
/%}
### 2. Configure the Lineage Ingestion
Here you can enter the Lineage Ingestion details:
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png"
alt="configure-lineage-ingestion"
caption="Configure the Lineage Ingestion"
/%}
<Collapse title="Lineage Options">
### Lineage Options
**Query Log Duration**
@ -63,10 +77,14 @@ Specify the duration in days for which the profiler should capture lineage data
**Result Limit**
Set the limit for the query log results to be run at a time.
</Collapse>
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png"
alt="schedule-and-deploy"
caption="View Service Ingestion pipelines"
/%}

View File

@ -12,36 +12,38 @@ After the metadata ingestion has been done correctly, we can configure and deplo
This Pipeline will be in charge of feeding the Profiler tab of the Table Entity, as well as running any tests configured in the Entity.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"}
alt="Table profile summary page"
caption="Table profile summary page"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"
alt="Table profile summary page"
caption="Table profile summary page"
/%}
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"}
alt="Column profile summary page"
caption="Column profile summary page"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"
alt="Column profile summary page"
caption="Column profile summary page"
/%}
### 1. Add a Profiler Ingestion
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Profiler Ingestion.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"}
alt="Add a profiler service"
caption="Add a profiler service"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"
alt="Add a profiler service"
caption="Add a profiler service"
/%}
### 2. Configure the Profiler Ingestion
Here you can enter the Profiler Ingestion details.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"}
alt="Set profiler configuration"
caption="Set profiler configuration"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"
alt="Set profiler configuration"
caption="Set profiler configuration"
/%}
#### Profiler Options
**Name**
@ -78,17 +80,20 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
### 4. Updating Profiler setting at the table level
Once you have created your profiler you can adjust some behavior at the table level by going to the table and clicking on the profiler tab
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"}
alt="table profile settings"
caption="table profile settings"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"
alt="table profile settings"
caption="table profile settings"
/%}
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"
alt="table profile settings"
caption="table profile settings"
/%}
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"}
alt="table profile settings"
caption="table profile settings"
/>
#### Profiler Options
**Profile Sample**

View File

@ -37,11 +37,13 @@ Returns the number of columns in the Table.
## System Metrics
System metrics provide information related to DML operations performed on the table. These metrics present a concise view of your data freshness. In a typical data processing flow tables are updated at a certain frequency. Table freshness will be monitored by confirming a set of operations has been performed against the table. To increase trust in your data assets, OpenMetadata will monitor the `INSERT`, `UPDATE` and `DELETE` operations performed against your table to showcase 2 metrics related to freshness (see below for more details). With this information, you are able to see when a specific operation was last perform and how many rows it affected.
<Image
src={"/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"}
alt="table profile freshness metrics"
caption="table profile freshness metrics"
/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"
alt="table profile freshness metrics"
caption="table profile freshness metrics"
/%}
These metrics are available for **BigQuery**, **Redshift** and **Snowflake**. Other database engines are currently not supported so the computation of the system metrics will be skipped.

View File

@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the Queries tab from the Table Entity Page.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/table-entity-page.png"
alt="table-entity-page"
caption="Table Entity Page"
/%}
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Usage Ingestion will be in charge of obtaining this data.
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Usage Ingestion.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/add-ingestion.png"
alt="add-ingestion"
caption="Add Ingestion"
/%}
### 2. Configure the Usage Ingestion
Here you can enter the Usage Ingestion details:
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png" alt="configure-usage-ingestion" caption="Configure the Usage Ingestion"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png"
alt="configure-usage-ingestion"
caption="Configure the Usage Ingestion"
/%}
<Collapse title="Usage Options">
### Usage Options
**Query Log Duration**
@ -67,10 +81,14 @@ Mention the absolute file path of the temporary file name to store the query log
**Result Limit**
Set the limit for the query log results to be run at a time.
</Collapse>
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions.
<Image src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png"
alt="schedule-and-deploy"
caption="View Service Ingestion pipelines"
/%}

View File

@ -12,7 +12,7 @@ One can configure the metadata ingestion filter for database source using four c
`Schema Filter Pattern`, `Table Filter Pattern` & `Use FQN For Filtering`. In this documnet we will learn about each field in detail
along with many examples.
<Collapse title="Configuring Filters via UI">
### Configuring Filters via UI
Filters can be configured in UI while adding an ingestion pipeline through `Add Metadata Ingestion` page.
@ -22,10 +22,9 @@ Filters can be configured in UI while adding an ingestion pipeline through `Add
caption="Database Filter Pattern Fields"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI">
### Configuring Filters via CLI
Filters can be configured in CLI in connection configuration within `source.sourceConfig.config` field as described below.
@ -57,7 +56,6 @@ sourceConfig:
- table4
```
</Collapse>
### Use FQN For Filtering
@ -93,7 +91,7 @@ In this example we want to ingest all databases which contains `SNOWFLAKE` in na
appied would be `.*SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`
and `TEST_SNOWFLAKEDB`.
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-1.png"
@ -101,9 +99,8 @@ and `TEST_SNOWFLAKEDB`.
caption="Database Filter Pattern Example 1"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -116,14 +113,13 @@ sourceConfig:
```
</Collapse>
#### Example 2
In this example we want to ingest all databases which starts with `SNOWFLAKE` in name, then the fillter pattern
appied would be `^SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE` & `SNOWFLAKE_SAMPLE_DATA`.
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-2.png"
@ -131,9 +127,8 @@ appied would be `^SNOWFLAKE.*` in the include field. This will result in ingesti
caption="Database Filter Pattern Example 2"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -145,7 +140,6 @@ sourceConfig:
- ^SNOWFLAKE.*
```
</Collapse>
#### Example 3
@ -153,7 +147,7 @@ sourceConfig:
In this example we want to ingest all databases for which the name starts with `SNOWFLAKE` OR ends with `DB` , then the fillter pattern
appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`, `TEST_SNOWFLAKEDB` & `DUMMY_DB`.
<Collapse title="Configuring Filters via UI for Example 3">
### Configuring Filters via UI for Example 3
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-3.png"
@ -161,9 +155,9 @@ appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in i
caption="Database Filter Pattern Example 3"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
### Configuring Filters via CLI for Example 3
```yaml
sourceConfig:
@ -176,13 +170,12 @@ sourceConfig:
- .*DB$
```
</Collapse>
#### Example 4
In this example we want to ingest only the `SNOWFLAKE` database then the fillter pattern appied would be `^SNOWFLAKE$`.
<Collapse title="Configuring Filters via UI for Example 4">
### Configuring Filters via UI for Example 4
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/database-filter-example-4.png"
@ -190,9 +183,9 @@ In this example we want to ingest only the `SNOWFLAKE` database then the fillter
caption="Database Filter Pattern Example 4"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 4">
### Configuring Filters via CLI for Example 4
```yaml
sourceConfig:
@ -203,8 +196,6 @@ sourceConfig:
includes:
- ^SNOWFLAKE$
```
</Collapse>
### Schema Filter Pattern
@ -242,7 +233,7 @@ In this example we want to ingest all schema winthin any database with name `PUB
appied would be `^PUBLIC$` in the include field. This will result in ingestion of schemas `SNOWFLAKE.PUBLIC` & `SNOWFLAKE_SAMPLE_DATA.PUBLIC`
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-1.png"
@ -250,9 +241,8 @@ appied would be `^PUBLIC$` in the include field. This will result in ingestion o
caption="Schema Filter Pattern Example 1"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -263,7 +253,6 @@ sourceConfig:
includes:
- ^PUBLIC$
```
</Collapse>
#### Example 2
@ -274,7 +263,7 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-2.png"
@ -282,9 +271,8 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
caption="Schema Filter Pattern Example 2"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -295,7 +283,6 @@ sourceConfig:
excludes:
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$
```
</Collapse>
#### Example 3
@ -303,7 +290,7 @@ sourceConfig:
In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWFLAKE_SAMPLE_DATA` that starts with `TPCH_` i.e `SNOWFLAKE_SAMPLE_DATA.TPCH_1`, `SNOWFLAKE_SAMPLE_DATA.TPCH_10` & `SNOWFLAKE_SAMPLE_DATA.TPCH_100`. To achive this an include schema filter will be applied with pattern `^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$` & `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*`, we need to set `useFqnForFiltering` as true as we want to apply filter on FQN.
<Collapse title="Configuring Filters via UI for Example 3">
### Configuring Filters via UI for Example 3
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/schema-filter-example-3.png"
@ -311,9 +298,8 @@ In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWF
caption="Schema Filter Pattern Example 3"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
### Configuring Filters via CLI for Example 3
```yaml
sourceConfig:
@ -325,7 +311,6 @@ sourceConfig:
- ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*
```
</Collapse>
### Table Filter Pattern
@ -371,7 +356,7 @@ Snowflake_Prod # Snowflake Service Name
In this example we want to ingest table with name `CUSTOMER` within any schema and database. In this case we need to apply include table filter pattern `^CUSTOMER$`. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER`, `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-1.png"
@ -379,9 +364,9 @@ In this example we want to ingest table with name `CUSTOMER` within any schema a
caption="Table Filter Pattern Example 1"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -392,14 +377,12 @@ sourceConfig:
includes:
- ^CUSTOMER$
```
</Collapse>
#### Example 2
In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` schema of any database. In this case we need to apply include table filter pattern `.*\.PUBLIC\.CUSTOMER$` this will also require to set the `useFqnForFiltering` flag as true as we want to apply filter on FQN. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
{% image
src="/images/v0.13.2/openmetadata/ingestion/workflows/metadata/filter-patterns/table-filter-example-2.png"
@ -407,9 +390,8 @@ In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` sch
caption="Table Filter Pattern Example 2"
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -419,5 +401,4 @@ sourceConfig:
tableFilterPattern:
includes:
- .*\.PUBLIC\.CUSTOMER$
```
</Collapse>
```

View File

@ -11,7 +11,7 @@ As of now, OpenMetadata uses Airflow under the hood as a scheduler for the Inges
This is the right place if you are curious about our current approach or if you are looking forward to contribute by
adding the implementation to deploy workflows to another tool directly from the UI.
<Note>
{% note %}
Here we are talking about an internal implementation detail. Do not be confused about the information that is going to
be shared here vs. the pipeline services supported as connectors for metadata extraction.
@ -19,7 +19,7 @@ be shared here vs. the pipeline services supported as connectors for metadata ex
For example, we use Airflow as an internal element to deploy and schedule ingestion workflows, but we can also extract
metadata from Airflow. Fivetran, for example, is a possible source, but we are not using it to deploy and schedule workflows.
</Note>
{% /note %}
## Before Reading
@ -32,7 +32,11 @@ Everything in OpenMetadata is centralized and managed via the API. Then, the Wor
via the OpenMetadata server APIs. Morover, the `IngestionPipeline` Entity is also defined in a JSON Schema that you
can find [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json).
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png" alt="system context"/>
{% image
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-system-context.drawio.png"
alt="system context"
/%}
Note how OpenMetadata here acts as a middleware, connecting the actions being triggered in the UI to external orchestration
systems, which will be the ones managing the heavy lifting of getting a workflow created, scheduled and run. Out of the box,
@ -68,29 +72,40 @@ After creating a new workflow from the UI or when editing it, there are two call
- `POST` or `PUT` call to update the `Ingestion Pipeline Entity`,
- `/deploy` HTTP call to the `IngestionPipelienResource` to trigger the deployment of the new or updated DAG in the Orchestrator.
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png" alt="software system"/>
{% image
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-software-system.drawio.png"
alt="software system"
/%}
### Creating the Ingestion Pipeline
Based on its [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/ingestionPipelines/ingestionPipeline.json),
there are a few properties about the Ingestion Pipeline we can highlight:
1. `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
2. `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
3. `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
4. `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
5. `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
**1.** `service`: a pipeline is linked via an Entity Reference to a Service Entity or a Test Suite Entity. From the service is
<Note>
**2.** `pipelineType`: which represents the type of workflow to be created. This flag will be used down the line in the Orchestrator
logic to properly create the required Python classes (e.g., `Workflow`, `ProfilerWorkflow`, `TestSuiteWorkflow`, etc.).
**3.** `sourceConfig`: which is dependent on the pipeline type and define how the pipeline should behave (e.g., marking ingesting views as `False`).
**4.** `openMetadataServerConnection`: defining how to reach the OM server with properties such as host, auth configuration, etc.
**5.** `airflowConfig`: with Airflow specific configurations about the DAG such as the schedule.
{% note %}
While we have yet to update the `airflowConfig` property to be more generic, the only field actually being used is the
schedule. You might see this property here, but the whole process can still support other Orchestrators. We will clean
this up in future releases.
</Note>
{% /note %}
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png" alt="container create"/>
{% image
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-container-IngestionPipeline.drawio.png"
alt="container create"
/%}
Here, the process of creating an Ingestion Pipeline is then the same as with any other Entity.
@ -104,7 +119,11 @@ The role of OpenMetadata here is just to pass the required communication to the
DAG. Basically we need a way to send a call to the Orchestrator that generated a DAG / Workflow object that will be run
using the proper functions and classes from the Ingestion Framework.
<Image src="/images/v1.0.0/openmetadata/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png" alt="deploy"/>
{% image
src="/images/v1.0.0/features/ingestion/ingestion-pipeline/ingestion-pipeline-pipeline-service-container.drawio.png"
alt="deploy"
/%}
Any Orchestration system that is capable to **DYNAMICALLY** create a workflow based on a given input (that can be obtained
from the `IngestionPipeline` Entity information) is a potentially valid candidate to be used as a Pipeline Service.
@ -118,8 +137,9 @@ and prepared to contribute a new Pipeline Service Client implementation.
In this example I will be deploying an ingestion workflow to get the metadata from a MySQL database. After clicking on the UI
to deploy such pipeline, these are the calls that get triggered:
1. `POST` call to create the `IngestionPipeline` Entity
2. `POST` call to deploy the newly created pipeline.
**1.** `POST` call to create the `IngestionPipeline` Entity
**2.** `POST` call to deploy the newly created pipeline.
## Create the Ingestion Pipeline
@ -324,10 +344,12 @@ the workflow class depends on our goal: Ingestion, Profiling, Testing...
You can follow this logic deeper in the source code of the managed APIs package, but the important thought here is that we
need the following logic flow:
1. An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
2. We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
**1.** An Ingestion Pipeline is created and sent to the Ingestion Pipeline Resource.
**2.** We need to transform this Ingestion Pipeline into something capable of running the Python `Workflow`. For Airflow,
this something is a `.py` file.
3. Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
**3.** Note that as Airflow required us to build the whole dynamic creation, we shifted all the building logic towards the managed
APIs package, but if any orchestrator already has an API capable of creating DAGs dynamically, this process can be directly
handled in the Pipeline Service Client implementation as all the necessary data is present in the Ingestion Pipeline Entity.

View File

@ -40,8 +40,8 @@ AS SELECT ... FROM schema.table_a JOIN another_schema.table_b;
From this query we will extract the following information:
1. There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
2. There is a `target` table `schema.my_view`.
**1.** There are two `source` tables, represented by the string `schema.table_a` as `another_schema.table_b`
**2.** There is a `target` table `schema.my_view`.
In this case we suppose that the database connection requires us to write the table names as `<schema>.<table>`. However,
there are other possible options. Sometimes we can find just `<table>` in a query, or even `<database>.<schema>.<table>`.
@ -67,13 +67,13 @@ Note that if a Model is not materialized, its data won't be ingested.
### Query Log
<Note>
{% note %}
Up until 0.11, Query Log analysis for lineage happens during the Usage Workflow.
From 0.12 onwards, there is a separated Lineage Workflow that will take care of this process.
</Note>
{% /note %}
#### How to run?
@ -98,7 +98,7 @@ That being said, this process is the same as the one shown in the View Lineage a
parse, we will obtain the `source` and `target` information, use ElasticSearch to identify the Entities in OpenMetadata
and then send the lineage to the API.
<Note>
{% note %}
When running any query from within OpenMetadata we add an information comment to the query text
@ -109,7 +109,7 @@ When running any query from within OpenMetadata we add an information comment to
Note that queries with this text as well as the ones containing headers from dbt (which follow a similar structure),
will be filtered out when building the query log internally.
</Note>
{% /note %}
#### Troubleshooting
@ -135,8 +135,11 @@ the data feeding the Dashboards and Charts.
When ingesting the Dashboards metadata, the workflow will pick up the origin tables (or database, in the case of
PowerBI), and prepare the lineage information.
<Image src="/images/v1.0.0/openmetadata/ingestion/lineage/dashboard-ingestion-lineage.png" alt="Dashboard Lineage"/>
{% image
src="/images/v1.0.0/features/ingestion/lineage/dashboard-ingestion-lineage.png"
alt="Dashboard Lineage"
caption="Dashboard Lineage"
/%}
## Pipeline Services

View File

@ -9,7 +9,9 @@ The OpenMetadata home screen features a change activity feed that enables you vi
- Data for which you are an owner
- Data you are following
<Image
src={"/images/v1.0.0/openmetadata/ingestion/versioning/change-feeds.gif"}
alt="Change feeds"
/>
{% image
src="/images/v1.0.0/features/ingestion/versioning/change-feeds.gif"
alt="Change feeds"
/%}

View File

@ -6,7 +6,8 @@ slug: /connectors/ingestion/versioning/event-notification-via-webhooks
# Event Notification via Webhooks
The webhook interface allows you to build applications that receive all the data changes happening in your organization through APIs. Register URLs to receive metadata event notifications. Slack integration through incoming webhooks is one of many applications of this feature.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/versioning/event-notifications-via-webhooks.gif"}
alt="Event Notification via Webhooks"
/>
{% image
src="/images/v1.0.0/features/ingestion/versioning/event-notifications-via-webhooks.gif"
alt="Event Notification via Webhooks"
/%}

View File

@ -15,7 +15,8 @@ Metadata versioning helps **simplify debugging processes**. View the version his
Versioning also helps in **broader collaboration** among consumers and producers of data. Admins can provide access to more users in the organization to change certain fields. Crowdsourcing makes metadata the collective responsibility of the entire organization.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/versioning/metadata-versioning.gif"}
alt="Metadata versioning"
/>
{% image
src="/images/v1.0.0/features/ingestion/versioning/metadata-versioning.gif"
alt="Metadata versioning"
/%}

View File

@ -53,41 +53,46 @@ Test Cases specify a Test Definition. It will define what condition a test must
### Step 1: Creating a Test Suite
From your table service click on the `profiler` tab. From there you will be able to create table tests by clicking on the purple background `Add Test` top button or column tests by clicking on the white background `Add Test` button.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/profiler-tab-view.png"}
alt="Write your first test"
caption="Write your first test"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/profiler-tab-view.png"
alt="Write your first test"
caption="Write your first test"
/%}
On the next page you will be able to either select an existing Test Suite or Create a new one. If you select an existing one your Test Case will automatically be added to the Test Suite
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-page.png"}
alt="Create test suite"
caption="Create test suite"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-page.png"
alt="Create test suite"
caption="Create test suite"
/%}
### Step 2: Create a Test Case
On the next page, you will create a Test Case. You will need to select a Test Definition from the drop down menu and specify the parameters of your Test Case.
**Note:** Test Case name needs to be unique across the whole platform. A warning message will show if your Test Case name is not unique.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-case-page.png"}
alt="Create test case"
caption="Create test case"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-case-page.png"
alt="Create test case"
caption="Create test case"
/%}
### Step 3: Add Ingestion Workflow
If you have created a new test suite you will see a purple background `Add Ingestion` button after clicking `submit`. This will allow you to schedule the execution of your Test Suite. If you have selected an existing Test Suite you are all set.
After clicking `Add Ingestion` you will be able to select an execution schedule for your Test Suite (note that you can edit this later). Once you have selected the desired scheduling time, click submit and you are all set.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/ingestion-page.png"}
alt="Create ingestion workflow"
caption="Create ingestion workflow"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/ingestion-page.png"
alt="Create ingestion workflow"
caption="Create ingestion workflow"
/%}
## Adding Tests with the YAML Config
@ -187,7 +192,7 @@ except ModuleNotFoundError:
from airflow.operators.python_operator import PythonOperator
from metadata.config.common import load_config_file
from metadata.data_quality.api.workflow import TestSuiteWorkflow
from metadata.test_suite.api.workflow import TestSuiteWorkflow
from airflow.utils.dates import days_ago
default_args = {
@ -232,43 +237,53 @@ configurations specified above.
## How to Visualize Test Results
### From the Test Suite View
From the home page click on the Test Suite menu in the left pannel.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-home-page.png"}
alt="Test suite home page"
caption="Test suite home page"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-home-page.png"
alt="Test suite home page"
caption="Test suite home page"
/%}
This will bring you to the Test Suite page where you can select a specific Test Suite.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-landing.png"}
alt="Test suite landing page"
caption="Test suite landing page"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-landing.png"
alt="Test suite landing page"
caption="Test suite landing page"
/%}
From there you can select a Test Suite and visualize the results associated with this specific Test Suite.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/test-suite-results.png"}
alt="Test suite results page"
caption="Test suite results page"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/test-suite-results.png"
alt="Test suite results page"
caption="Test suite results page"
/%}
### From a Table Entity
Navigate to your table and click on the `profiler` tab. From there you'll be able to see test results at the table or column level.
#### Table Level Test Results
In the top pannel, click on the white background `Data Quality` button. This will bring you to a summary of all your quality tests at the table level
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/table-results-entity.png"}
alt="Test suite results table"
caption="Test suite results table"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/table-results-entity.png"
alt="Test suite results table"
caption="Test suite results table"
/%}
#### Column Level Test Results
On the profiler page, click on a specific column name. This will bring you to a new page where you can click the white background `Quality Test` button to see all the tests results related to your column.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/data-quality/colum-level-test-results.png"}
alt="Test suite results table"
caption="Test suite results table"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/data-quality/colum-level-test-results.png"
alt="Test suite results table"
caption="Test suite results table"
/%}
## Adding Custom Tests
While OpenMetadata provides out of the box tests, you may want to write your test results from your own custom quality test suite. This is very easy to do using the API.
@ -414,3 +429,4 @@ curl --location --request PUT 'http://localhost:8585/api/v1/testCase/local_redsh
You will now be able to see your test in the Test Suite or the table entity.

View File

@ -18,13 +18,13 @@ The dbt workflow requires the below keys to be present in the node of a manifest
- depends_on (required if lineage information needs to exctracted)
- columns (required if column description is to be processed)
<Note>
{% note %}
The `name/alias, schema and database` values from dbt manifest.json should match values of the `name, schema and database` of the table/view ingested in OpenMetadata.
dbt will only be processed if these values match
</Note>
{% /note %}
Below is a sample manifest.json node for reference:
```json

View File

@ -32,7 +32,12 @@ Configure the dbt Workflow from the CLI.
Queries used to create the dbt models can be viewed in the dbt tab
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt-query" caption="dbt Query"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt-query"
caption="dbt Query"
/%}
### 2. dbt Lineage
@ -40,7 +45,12 @@ Lineage from dbt models can be viewed in the Lineage tab.
For more information on how lineage is extracted from dbt take a look [here](/connectors/ingestion/workflows/dbt/ingest-dbt-lineage)
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-lineage.png" alt="dbt-lineage" caption="dbt Lineage"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-lineage.png"
alt="dbt-lineage"
caption="dbt Lineage"
/%}
### 3. dbt Tags
@ -48,7 +58,13 @@ Table and column level tags can be imported from dbt
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-tags) for adding dbt tags
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt-tags" caption="dbt Tags"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
alt="dbt-tags"
caption="dbt Tags"
/%}
### 4. dbt Owner
@ -56,7 +72,12 @@ Owner from dbt models can be imported and assigned to respective tables
Please refer [here](/connectors/ingestion/workflows/dbt/ingest-dbt-owner) for adding dbt owner
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-owner.png" alt="dbt-owner" caption="dbt Owner"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-owner.png"
alt="dbt-owner"
caption="dbt Owner"
/%}
### 5. dbt Descriptions
@ -64,13 +85,22 @@ Descriptions from dbt models can be imported and assigned to respective tables a
By default descriptions from `manifest.json` will be imported. Descriptions from `catalog.json` will only be updated if catalog file is passed.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png" alt="dbt-descriptions" caption="dbt Descriptions"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-descriptions.png"
alt="dbt-descriptions"
caption="dbt Descriptions"
/%}
### 6. dbt Tests and Test Results
Tests from dbt will only be imported if the `run_results.json` file is passed.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tests.png" alt="dbt-tests" caption="dbt Tests"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-tests.png"
alt="dbt-tests"
caption="dbt Tests"
/%}
## Troubleshooting

View File

@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the dbt tab from the Table Entity Page.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt"
caption="dbt"
/%}
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.

View File

@ -28,7 +28,13 @@ Openmetadata fetches the lineage information from the `manifest.json` file. Belo
```
For the above case the lineage will be created as shown in below:
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-lineage-customers.png" alt="dbt-lineage-customers" caption="dbt Lineage"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-lineage-customers.png"
alt="dbt-lineage-customers"
caption="dbt Lineage"
/%}
### 2. Lineage information from dbt queries
Openmetadata fetches the dbt query information from the `manifest.json` file.

View File

@ -51,47 +51,86 @@ The user or team which will be set as the entity owner should be first created i
While linking the owner from `manifest.json` or `catalog.json` files to the entity, OpenMetadata first searches for the user if it is present. If the user is not present it searches for the team
#### Following steps shows adding a User to OpenMetadata:
1. Click on the `Users` section from homepage
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png" alt="click-users-page" caption="Click Users page"/>
**1.** Click on the `Users` section from homepage
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-users-page.png"
alt="click-users-page"
caption="Click Users page"
/%}
**2.** Click on the `Add User` button
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png"
alt="click-add-user"
caption="Click Add User"
/%}
2. Click on the `Add User` button
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-user.png" alt="click-add-user" caption="Click Add User"/>
3. Enter the details as shown for the user
<Note>
{% note %}
If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`, you need to enter `openmetadata@youremail.com` in the email id section of add user form as shown below.
</Note>
{% /note %}
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png"
alt="add-user-dbt"
caption="Add User"
/%}
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-user-dbt.png" alt="add-user-dbt" caption="Add User"/>
#### Following steps shows adding a Team to OpenMetadata:
1. Click on the `Teams` section from homepage
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png" alt="click-teams-page" caption="Click Teams page"/>
**1.** Click on the `Teams` section from homepage
2. Click on the `Add Team` button
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png" alt="click-add-team" caption="Click Add Team"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-teams-page.png"
alt="click-teams-page"
caption="Click Teams page"
/%}
3. Enter the details as shown for the team
**2.** Click on the `Add Team` button
<Note>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/click-add-team.png"
alt="click-add-team"
caption="Click Add Team"
/%}
**3.** Enter the details as shown for the team
{% note %}
If the owner's name in `manifest.json` or `catalog.json` file is `openmetadata`, you need to enter `openmetadata` in the name section of add team form as shown below.
</Note>
{% /note %}
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png"
alt="add-team-dbt"
caption="Add Team"
/%}
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/add-team-dbt.png" alt="add-team-dbt" caption="Add Team"/>
## Linking the Owner to the table
After runing the ingestion workflow with dbt you can see the created user or team getting linked to the table as it's owner as it was specified in the `manifest.json` or `catalog.json` file.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png" alt="linked-user" caption="Linked User"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/ingest_dbt_owner/linked-user.png"
alt="linked-user"
caption="Linked User"
/%}
<Note>
{% note %}
If a table already has a owner linked to it, owner from the dbt will not update the current owner.
</Note>
{% /note %}

View File

@ -11,11 +11,11 @@ Follow the link [here](https://docs.getdbt.com/reference/resource-configs/tags)
## Requirements
<Note>
{% note %}
For dbt tags, if the tag is not already present it will be created under tag category `DBTTags` in OpenMetadata
</Note>
{% /note %}
### 1. Table-Level Tags information in manifest.json file
Openmetadata fetches the table-level tags information from the `manifest.json` file. Below is a sample `manifest.json` file node containing tags information under `node_name->tags`.
@ -65,4 +65,8 @@ Openmetadata fetches the column-level tags information from the `manifest.json`
### 3. Viewing the tags on tables and columns
Table and Column level tags ingested from dbt can be viewed on the node in OpenMetadata
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-tags.png" alt="dbt_tags" caption="dbt tags"/>
{% image
src="/images/v1.0.0//features/ingestion/workflows/dbt/dbt-features/dbt-tags.png"
alt="dbt_tags"
caption="dbt tags"
/%}

View File

@ -12,7 +12,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the dbt tab from the Table Entity Page.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-features/dbt-query.png" alt="dbt" caption="dbt"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-features/dbt-query.png"
alt="dbt"
caption="dbt"
/%}
We can create a workflow that will obtain the dbt information from the dbt files and feed it to OpenMetadata. The dbt Ingestion will be in charge of obtaining this data.
@ -20,7 +25,12 @@ We can create a workflow that will obtain the dbt information from the dbt files
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/add-ingestion.png" alt="add-ingestion" caption="Add dbt Ingestion"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/add-ingestion.png"
alt="add-ingestion"
caption="Add dbt Ingestion"
/%}
### 2. Configure the dbt Ingestion
@ -29,11 +39,11 @@ Here you can enter the dbt Ingestion details:
dbt sources for manifest.json, catalog.json and run_results.json files can be configured as shown in the UI below. The dbt files are needed to be stored on one of these sources.
<Note>
{% note %}
Only the `manifest.json` file is compulsory for dbt ingestion.
</Note>
{% /note %}
#### AWS S3 Buckets
@ -42,7 +52,12 @@ OpenMetadata connects to the AWS s3 bucket via the credentials provided and scan
The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/aws-s3.png" alt="aws-s3-bucket" caption="AWS S3 Bucket Config"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/aws-s3.png"
alt="aws-s3-bucket"
caption="AWS S3 Bucket Config"
/%}
#### Google Cloud Storage Buckets
@ -51,13 +66,23 @@ OpenMetadata connects to the GCS bucket via the credentials provided and scans t
The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files.
GCS credentials can be stored in two ways:
1. Entering the credentials directly into the form
**1.** Entering the credentials directly into the form
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/gcs-bucket-form.png" alt="gcs-storage-bucket-form" caption="GCS Bucket config"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/gcs-bucket-form.png"
alt="gcs-storage-bucket-form"
caption="GCS Bucket config"
/%}
2. Entering the path of file in which the GCS bucket credentials are stored.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/gcs-bucket-path.png" alt="gcs-storage-bucket-path" caption="GCS Bucket Path Config"/>
**2.** Entering the path of file in which the GCS bucket credentials are stored.
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/gcs-bucket-path.png"
alt="gcs-storage-bucket-path"
caption="GCS Bucket Path Config"
/%}
For more information on Google Cloud Storage authentication click [here](https://cloud.google.com/docs/authentication/getting-started#create-service-account-console).
@ -65,13 +90,22 @@ For more information on Google Cloud Storage authentication click [here](https:/
Path of the `manifest.json`, `catalog.json` and `run_results.json` files stored in the local system or in the container in which openmetadata server is running can be directly provided.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/local-storage.png" alt="local-storage" caption="Local Storage Config"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/local-storage.png"
alt="local-storage"
caption="Local Storage Config"
/%}
#### File Server
File server path of the `manifest.json`, `catalog.json` and `run_results.json` files stored on a file server directly provided.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/file_server.png" alt="file-server" caption="File Server Config"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/file_server.png"
alt="file-server"
caption="File Server Config"
/%}
#### dbt Cloud
@ -79,9 +113,18 @@ Click on the the link [here](https://docs.getdbt.com/guides/getting-started) for
OpenMetadata uses dbt cloud APIs to fetch the `run artifacts` (manifest.json, catalog.json and run_results.json) from the most recent dbt run.
The APIs need to be authenticated using an Authentication Token. Follow the link [here](https://docs.getdbt.com/dbt-cloud/api-v2#section/Authentication) to generate an authentication token for your dbt cloud account.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/dbt-cloud.png" alt="dbt-cloud" caption="dbt Cloud config"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/dbt-cloud.png"
alt="dbt-cloud"
caption="dbt Cloud config"
/%}
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/dbt/schedule-and-deploy.png" alt="schedule-and-deploy" caption="Schedule dbt ingestion pipeline"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/dbt/schedule-and-deploy.png"
alt="schedule-and-deploy"
caption="Schedule dbt ingestion pipeline"
/%}

View File

@ -38,7 +38,12 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the Lineage tab from the Table Entity Page.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/lineage/table-entity-page.png"
alt="table-entity-page"
caption="Table Entity Page"
/%}
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
@ -46,15 +51,23 @@ We can create a workflow that will obtain the query log and table creation infor
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/lineage/add-ingestion.png"
alt="add-ingestion"
caption="Add Ingestion"
/%}
### 2. Configure the Lineage Ingestion
Here you can enter the Lineage Ingestion details:
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/lineage/configure-lineage-ingestion.png"
alt="configure-lineage-ingestion"
caption="Configure the Lineage Ingestion"
/%}
<Collapse title="Lineage Options">
### Lineage Options
**Query Log Duration**
@ -63,10 +76,15 @@ Specify the duration in days for which the profiler should capture lineage data
**Result Limit**
Set the limit for the query log results to be run at a time.
</Collapse>
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/lineage/scheule-and-deploy.png"
alt="schedule-and-deploy"
caption="View Service Ingestion pipelines"
/%}

View File

@ -26,11 +26,11 @@ you to execute the lineage workflow using a query log file. This can be arbitrar
A query log file is a standard CSV file which contains the following information.
<Note>
{% note %}
A standard CSV should be comma separated, and each row represented as a single line in the file.
</Note>
{% /note %}
- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes

View File

@ -12,36 +12,40 @@ After the metadata ingestion has been done correctly, we can configure and deplo
This Pipeline will be in charge of feeding the Profiler tab of the Table Entity, as well as running any tests configured in the Entity.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-summary-table.png"}
alt="Table profile summary page"
caption="Table profile summary page"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-summary-table.png"
alt="Table profile summary page"
caption="Table profile summary page"
/%}
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-summary-colomn.png"
alt="Column profile summary page"
caption="Column profile summary page"
/%}
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-summary-colomn.png"}
alt="Column profile summary page"
caption="Column profile summary page"
/>
### 1. Add a Profiler Ingestion
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Profiler Ingestion.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/add-profiler-workflow.png"}
alt="Add a profiler service"
caption="Add a profiler service"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/add-profiler-workflow.png"
alt="Add a profiler service"
caption="Add a profiler service"
/%}
### 2. Configure the Profiler Ingestion
Here you can enter the Profiler Ingestion details.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/configure-profiler-workflow.png"}
alt="Set profiler configuration"
caption="Set profiler configuration"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/configure-profiler-workflow.png"
alt="Set profiler configuration"
caption="Set profiler configuration"
/%}
#### Profiler Options
**Name**
@ -84,17 +88,18 @@ After clicking Next, you will be redirected to the Scheduling form. This will be
### 4. Updating Profiler setting at the table level
Once you have created your profiler you can adjust some behavior at the table level by going to the table and clicking on the profiler tab
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/accessing-table-profile-settings.png"}
alt="table profile settings"
caption="table profile settings"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/accessing-table-profile-settings.png"
alt="table profile settings"
caption="table profile settings"
/%}
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/table-profile-summary-view.png"
alt="table profile settings"
caption="table profile settings"
/%}
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/table-profile-summary-view.png"}
alt="table profile settings"
caption="table profile settings"
/>
#### Profiler Options
**Profile Sample**

View File

@ -14,11 +14,11 @@ A Metric is a computation that we can run on top of a Table or Column to receive
On this page, you will learn all the metrics that we currently support and their meaning. We will base all the namings on the definitions on the JSON Schemas.
<Note>
{% note %}
You can check the definition of the `columnProfile` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/data/table.json#L271). On the other hand, the metrics are implemented [here](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/src/metadata/orm\_profiler/metrics).
</Note>
{% /note %}
We will base all the namings on the definitions on the JSON Schemas.
@ -37,11 +37,12 @@ Returns the number of columns in the Table.
## System Metrics
System metrics provide information related to DML operations performed on the table. These metrics present a concise view of your data freshness. In a typical data processing flow tables are updated at a certain frequency. Table freshness will be monitored by confirming a set of operations has been performed against the table. To increase trust in your data assets, OpenMetadata will monitor the `INSERT`, `UPDATE` and `DELETE` operations performed against your table to showcase 2 metrics related to freshness (see below for more details). With this information, you are able to see when a specific operation was last perform and how many rows it affected.
<Image
src={"/images/v1.0.0/openmetadata/ingestion/workflows/profiler/profiler-freshness-metrics.png"}
alt="table profile freshness metrics"
caption="table profile freshness metrics"
/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/profiler/profiler-freshness-metrics.png"
alt="table profile freshness metrics"
caption="table profile freshness metrics"
/%}
These metrics are available for **BigQuery**, **Redshift** and **Snowflake**. Other database engines are currently not supported so the computation of the system metrics will be skipped.

View File

@ -38,7 +38,11 @@ Once the metadata ingestion runs correctly and we are able to explore the servic
This will populate the Queries tab from the Table Entity Page.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/usage/table-entity-page.png"
alt="table-entity-page"
caption="Table Entity Page"
/%}
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Usage Ingestion will be in charge of obtaining this data.
@ -46,15 +50,25 @@ We can create a workflow that will obtain the query log and table creation infor
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Usage Ingestion.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/usage/add-ingestion.png"
alt="add-ingestion"
caption="Add Ingestion"
/%}
### 2. Configure the Usage Ingestion
Here you can enter the Usage Ingestion details:
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/configure-usage-ingestion.png" alt="configure-usage-ingestion" caption="Configure the Usage Ingestion"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/usage/configure-usage-ingestion.png"
alt="configure-usage-ingestion"
caption="Configure the Usage Ingestion"
/%}
<Collapse title="Usage Options">
### Usage Options
**Query Log Duration**
@ -67,10 +81,16 @@ Mention the absolute file path of the temporary file name to store the query log
**Result Limit**
Set the limit for the query log results to be run at a time.
</Collapse>
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the usage pipeline being added to the Service Ingestions.
<Image src="/images/v1.0.0/openmetadata/ingestion/workflows/usage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
{% image
src="/images/v1.0.0/features/ingestion/workflows/usage/scheule-and-deploy.png"
alt="schedule-and-deploy"
caption="View Service Ingestion pipelines"
/%}

View File

@ -26,11 +26,11 @@ you to execute the lineage workflow using a query log file. This can be arbitrar
A query log file is a standard CSV file which contains the following information.
<Note>
{% note %}
A standard CSV should be comma separated, and each row represented as a single line in the file.
</Note>
{% /note %}
- **query_text:** This field contains the literal query that has been executed in the database. It is quite possible
that your query has commas `,` inside. Then, wrap each query in quotes `"<query>"` to not have any clashes

View File

@ -12,20 +12,19 @@ One can configure the metadata ingestion filter for database source using four c
`Schema Filter Pattern`, `Table Filter Pattern` & `Use FQN For Filtering`. In this documnet we will learn about each field in detail
along with many examples.
<Collapse title="Configuring Filters via UI">
### Configuring Filters via UI
Filters can be configured in UI while adding an ingestion pipeline through `Add Metadata Ingestion` page.
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-patterns.png"
alt="Database Filter Pattern Fields"
caption="Database Filter Pattern Fields"
/>
</Collapse>
/%}
<Collapse title="Configuring Filters via CLI">
### Configuring Filters via CLI
Filters can be configured in CLI in connection configuration within `source.sourceConfig.config` field as described below.
@ -57,7 +56,7 @@ sourceConfig:
- table4
```
</Collapse>
### Use FQN For Filtering
@ -93,17 +92,17 @@ In this example we want to ingest all databases which contains `SNOWFLAKE` in na
appied would be `.*SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`
and `TEST_SNOWFLAKEDB`.
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-1.png"
alt="Database Filter Pattern Example 1"
caption="Database Filter Pattern Example 1"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -116,24 +115,22 @@ sourceConfig:
```
</Collapse>
#### Example 2
In this example we want to ingest all databases which starts with `SNOWFLAKE` in name, then the fillter pattern
appied would be `^SNOWFLAKE.*` in the include field. This will result in ingestion of database `SNOWFLAKE` & `SNOWFLAKE_SAMPLE_DATA`.
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-2.png"
alt="Database Filter Pattern Example 2"
caption="Database Filter Pattern Example 2"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -145,7 +142,6 @@ sourceConfig:
- ^SNOWFLAKE.*
```
</Collapse>
#### Example 3
@ -153,17 +149,16 @@ sourceConfig:
In this example we want to ingest all databases for which the name starts with `SNOWFLAKE` OR ends with `DB` , then the fillter pattern
appied would be `^SNOWFLAKE` & `DB$` in the include field. This will result in ingestion of database `SNOWFLAKE`, `SNOWFLAKE_SAMPLE_DATA`, `TEST_SNOWFLAKEDB` & `DUMMY_DB`.
<Collapse title="Configuring Filters via UI for Example 3">
### Configuring Filters via UI for Example 3
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-3.png"
alt="Database Filter Pattern Example 3"
caption="Database Filter Pattern Example 3"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
### Configuring Filters via CLI for Example 3
```yaml
sourceConfig:
@ -176,23 +171,23 @@ sourceConfig:
- .*DB$
```
</Collapse>
#### Example 4
In this example we want to ingest only the `SNOWFLAKE` database then the fillter pattern appied would be `^SNOWFLAKE$`.
<Collapse title="Configuring Filters via UI for Example 4">
### Configuring Filters via UI for Example 4
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/database-filter-example-4.png"
alt="Database Filter Pattern Example 4"
caption="Database Filter Pattern Example 4"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 4">
### Configuring Filters via CLI for Example 4
```yaml
sourceConfig:
@ -203,7 +198,6 @@ sourceConfig:
includes:
- ^SNOWFLAKE$
```
</Collapse>
### Schema Filter Pattern
@ -242,17 +236,15 @@ In this example we want to ingest all schema winthin any database with name `PUB
appied would be `^PUBLIC$` in the include field. This will result in ingestion of schemas `SNOWFLAKE.PUBLIC` & `SNOWFLAKE_SAMPLE_DATA.PUBLIC`
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-1.png"
alt="Schema Filter Pattern Example 1"
caption="Schema Filter Pattern Example 1"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -263,7 +255,7 @@ sourceConfig:
includes:
- ^PUBLIC$
```
</Collapse>
#### Example 2
@ -274,17 +266,17 @@ Notice that we have two schemas availabale with name `PUBLIC` one is available i
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-2.png"
alt="Schema Filter Pattern Example 2"
caption="Schema Filter Pattern Example 2"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -295,7 +287,6 @@ sourceConfig:
excludes:
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.PUBLIC$
```
</Collapse>
#### Example 3
@ -303,17 +294,18 @@ sourceConfig:
In this example we want to ingest `SNOWFLAKE.PUBLIC` & all the schemas in `SNOWFLAKE_SAMPLE_DATA` that starts with `TPCH_` i.e `SNOWFLAKE_SAMPLE_DATA.TPCH_1`, `SNOWFLAKE_SAMPLE_DATA.TPCH_10` & `SNOWFLAKE_SAMPLE_DATA.TPCH_100`. To achive this an include schema filter will be applied with pattern `^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$` & `^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*`, we need to set `useFqnForFiltering` as true as we want to apply filter on FQN.
<Collapse title="Configuring Filters via UI for Example 3">
### Configuring Filters via UI for Example 3
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/schema-filter-example-3.png"
alt="Schema Filter Pattern Example 3"
caption="Schema Filter Pattern Example 3"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 3">
### Configuring Filters via CLI for Example 3
```yaml
sourceConfig:
@ -325,7 +317,7 @@ sourceConfig:
- ^Snowflake_Prod\.SNOWFLAKE\.PUBLIC$
- ^Snowflake_Prod\.SNOWFLAKE_SAMPLE_DATA\.TPCH_.*
```
</Collapse>
### Table Filter Pattern
@ -371,17 +363,17 @@ Snowflake_Prod # Snowflake Service Name
In this example we want to ingest table with name `CUSTOMER` within any schema and database. In this case we need to apply include table filter pattern `^CUSTOMER$`. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER`, `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.INFORMATION.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 1">
### Configuring Filters via UI for Example 1
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/table-filter-example-1.png"
alt="Table Filter Pattern Example 1"
caption="Table Filter Pattern Example 1"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 1">
### Configuring Filters via CLI for Example 1
```yaml
sourceConfig:
@ -392,24 +384,23 @@ sourceConfig:
includes:
- ^CUSTOMER$
```
</Collapse>
#### Example 2
In this example we want to ingest table with name `CUSTOMER` within `PUBLIC` schema of any database. In this case we need to apply include table filter pattern `.*\.PUBLIC\.CUSTOMER$` this will also require to set the `useFqnForFiltering` flag as true as we want to apply filter on FQN. This will result in ingestion of tables `Snowflake_Prod.SNOWFLAKE_SAMPLE_DATA.PUBLIC.CUSTOMER` & `Snowflake_Prod.SNOWFLAKE.PUBLIC.CUSTOMER`
<Collapse title="Configuring Filters via UI for Example 2">
### Configuring Filters via UI for Example 2
<Image
{% image
src="/images/v1.0.0/features/ingestion/workflows/metadata/filter-patterns/table-filter-example-2.png"
alt="Table Filter Pattern Example 2"
caption="Table Filter Pattern Example 2"
/>
/%}
</Collapse>
<Collapse title="Configuring Filters via CLI for Example 2">
### Configuring Filters via CLI for Example 2
```yaml
sourceConfig:
@ -420,4 +411,3 @@ sourceConfig:
includes:
- .*\.PUBLIC\.CUSTOMER$
```
</Collapse>

View File

@ -12,15 +12,15 @@ filter out the log tables while ingesting metadata.
Configuring these metadata filters with OpenMetadata is very easy, which uses regex for matching and filtering the metadata.
Following documents will guide you on how to configure filters based on the type of data source
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
{%inlineCalloutContainer%}
{%inlineCallout
bold="Database Filter Patterns"
icon="cable"
href="/connectors/ingestion/workflows/metadata/filter-patterns/database"
>
Learn more about how to configure filters for database sources.
</InlineCallout>
</InlineCalloutContainer>
href="/connectors/ingestion/workflows/metadata/filter-patterns/database" %}
Learn more about how to configure filters for database sources.
{%/inlineCallout%}
{%/inlineCalloutContainer%}