Added lineage workflow docs (#8710)

This commit is contained in:
Onkar Ravgan 2022-11-14 15:43:45 +05:30 committed by GitHub
parent bcbca7df75
commit 39afe9cd4a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 145 additions and 4 deletions

View File

@ -4,5 +4,66 @@ slug: /connectors/ingestion/workflows/lineage
---
# Lineage Workflow
Learn how to configure the Lineage workflow from the UI to ingest Lineage data from your data sources.
Introduced in 0.12
This workflow is available ONLY for the following connectors:
- [BigQuery](/connectors/database/bigquery)
- [Snowflake](/connectors/database/snowflake)
- [MSSQL](/connectors/database/mssql)
- [Redshift](/connectors/database/redshift)
- [Clickhouse](/connectors/database/clickhouse)
- [Postgres](/connectors/database/postgres)
- [Databricks](/connectors/database/databricks)
If your database service is not yet supported, you can use this same workflow by providing a Query Log file!
Learn how to do so 👇
<InlineCalloutContainer>
<InlineCallout
color="violet-70"
bold="Lineage Workflow through Query Logs"
icon="add_moderator"
href="/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs"
>
Configure the lineage workflow by providing a Query Log file.
</InlineCallout>
</InlineCalloutContainer>
## UI Configuration
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Entity Lineage information.
This will populate the Lineage tab from the Table Entity Page.
<Image src="/images/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
### 1. Add a Lineage Ingestion
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
<Image src="/images/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
### 2. Configure the Lineage Ingestion
Here you can enter the Lineage Ingestion details:
<Image src="/images/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
<Collapse title="Lineage Options">
**Query Log Duration**
Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 48 hours prior to when the ingestion workflow is run.
**Result Limit**
Set the limit for the query log results to be run at a time.
</Collapse>
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
<Image src="/images/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>

View File

@ -0,0 +1,78 @@
---
title: Lineage Workflow Through Query Logs
slug: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs
---
# Lineage Workflow Through Query Logs
The following database connectors supports lineage workflow in OpenMetadata:
- [BigQuery](/connectors/database/bigquery)
- [Snowflake](/connectors/database/snowflake)
- [MSSQL](/connectors/database/mssql)
- [Redshift](/connectors/database/redshift)
- [Clickhouse](/connectors/database/clickhouse)
- [Databricks](/connectors/database/databricks)
- [Postgres](/connectors/database/postgres)
If you are using any other database connector, direct execution of lineage workflow is not possible. This is mainly because these database connectors does not maintain query execution logs which is required for lineage workflow. This documentation will help you to learn, how to execute the lineage workflow using a query log file for all the database connectors.
## Query Log File
A query log file is a CSV file which contains the following information.
- **query:** This field contains the literal query that has been executed in the database.
- **user_name (optional):** Enter the database user name which has executed this query.
- **start_time (optional):** Enter the query execution start time in YYYY-MM-DD HH:MM:SS format.
- **end_time (optional):** Enter the query execution end time in YYYY-MM-DD HH:MM:SS format.
- **aborted (optional):** This field accepts values as true or false and indicates whether the query was aborted during execution
- **database_name (optional):** Enter the database name on which the query was executed.
- **schema_name (optional):** Enter the schema name to which the query is associated.
Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).
## Lineage Workflow
In order to run a Lineage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the lineage workflow.
### 1. Create a configuration file using template YAML
Create a new file called `query_log_lineage.yaml` in the current directory. Note that the current directory should be the openmetadata directory.
Copy and paste the configuration template below into the `query_log_lineage.yaml` the file you created.
```yaml
source:
type: query-log-lineage
serviceName: local_mysql
serviceConnection:
config:
type: Mysql
username: openmetadata_user
password: openmetadata_password
hostPort: localhost:3306
connectionOptions: {}
connectionArguments: {}
sourceConfig:
config:
queryLogFilePath: <path to query log file>
processor:
type: query-parser
config: {}
stage:
type: table-lineage
config:
filename: /tmp/query_log_lineage
bulkSink:
type: metadata-lineage
config:
filename: /tmp/query_log_lineage
workflowConfig:
openMetadataServerConfig:
hostPort: <OpenMetadata host and port>
authProvider: <OpenMetadata auth provider>
```
The `serviceName` and `serviceConnection` used in the above config has to be the same as used during Metadata Ingestion.
The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json).
- queryLogFilePath: Enter the file path of query log csv file.
### 2. Run with the CLI
First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
```yaml
metadata ingest -c <path-to-yaml>
```
Note that from connector-to-connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.

View File

@ -4,7 +4,7 @@ slug: /connectors/ingestion/workflows/usage
---
# Usage Workflow
Learn how to configure the Usage workflow from the UI to ingest Query history and Lineage data from your data sources.
Learn how to configure the Usage workflow from the UI to ingest Query history data from your data sources.
This workflow is available ONLY for the following connectors:
- [BigQuery](/connectors/database/bigquery)
@ -32,9 +32,9 @@ Learn how to do so 👇
## UI Configuration
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage and Entity Lineage information.
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage information.
This will populate the Queries and Lineage tab from the Table Entity Page.
This will populate the Queries tab from the Table Entity Page.
<Image src="/images/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>

View File

@ -484,6 +484,8 @@ site_menu:
url: /connectors/ingestion/workflows/usage/usage-workflow-query-logs
- category: Connectors / Ingestion / Workflows / Lineage
url: /connectors/ingestion/workflows/lineage
- category: Connectors / Ingestion / Workflows / Lineage / Lineage Workflow Through Query Logs
url: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs
- category: Connectors / Ingestion / Workflows / Profiler
url: /connectors/ingestion/workflows/profiler
- category: Connectors / Ingestion / Workflows / Profiler / Metrics

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 225 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 176 KiB