mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-11-02 19:48:17 +00:00
Added lineage workflow docs (#8710)
This commit is contained in:
parent
bcbca7df75
commit
39afe9cd4a
@ -4,5 +4,66 @@ slug: /connectors/ingestion/workflows/lineage
|
||||
---
|
||||
|
||||
# Lineage Workflow
|
||||
Learn how to configure the Lineage workflow from the UI to ingest Lineage data from your data sources.
|
||||
|
||||
Introduced in 0.12
|
||||
This workflow is available ONLY for the following connectors:
|
||||
- [BigQuery](/connectors/database/bigquery)
|
||||
- [Snowflake](/connectors/database/snowflake)
|
||||
- [MSSQL](/connectors/database/mssql)
|
||||
- [Redshift](/connectors/database/redshift)
|
||||
- [Clickhouse](/connectors/database/clickhouse)
|
||||
- [Postgres](/connectors/database/postgres)
|
||||
- [Databricks](/connectors/database/databricks)
|
||||
|
||||
If your database service is not yet supported, you can use this same workflow by providing a Query Log file!
|
||||
|
||||
Learn how to do so 👇
|
||||
|
||||
<InlineCalloutContainer>
|
||||
<InlineCallout
|
||||
color="violet-70"
|
||||
bold="Lineage Workflow through Query Logs"
|
||||
icon="add_moderator"
|
||||
href="/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs"
|
||||
>
|
||||
Configure the lineage workflow by providing a Query Log file.
|
||||
</InlineCallout>
|
||||
</InlineCalloutContainer>
|
||||
|
||||
## UI Configuration
|
||||
|
||||
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Entity Lineage information.
|
||||
|
||||
This will populate the Lineage tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/openmetadata/ingestion/workflows/lineage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
|
||||
We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data.
|
||||
|
||||
### 1. Add a Lineage Ingestion
|
||||
|
||||
From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion.
|
||||
|
||||
<Image src="/images/openmetadata/ingestion/workflows/lineage/add-ingestion.png" alt="add-ingestion" caption="Add Ingestion"/>
|
||||
|
||||
### 2. Configure the Lineage Ingestion
|
||||
|
||||
Here you can enter the Lineage Ingestion details:
|
||||
|
||||
<Image src="/images/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png" alt="configure-lineage-ingestion" caption="Configure the Lineage Ingestion"/>
|
||||
|
||||
<Collapse title="Lineage Options">
|
||||
|
||||
**Query Log Duration**
|
||||
|
||||
Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 48 hours prior to when the ingestion workflow is run.
|
||||
|
||||
**Result Limit**
|
||||
|
||||
Set the limit for the query log results to be run at a time.
|
||||
</Collapse>
|
||||
|
||||
### 3. Schedule and Deploy
|
||||
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
|
||||
|
||||
<Image src="/images/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png" alt="schedule-and-deploy" caption="View Service Ingestion pipelines"/>
|
||||
@ -0,0 +1,78 @@
|
||||
---
|
||||
title: Lineage Workflow Through Query Logs
|
||||
slug: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs
|
||||
---
|
||||
|
||||
# Lineage Workflow Through Query Logs
|
||||
|
||||
The following database connectors supports lineage workflow in OpenMetadata:
|
||||
- [BigQuery](/connectors/database/bigquery)
|
||||
- [Snowflake](/connectors/database/snowflake)
|
||||
- [MSSQL](/connectors/database/mssql)
|
||||
- [Redshift](/connectors/database/redshift)
|
||||
- [Clickhouse](/connectors/database/clickhouse)
|
||||
- [Databricks](/connectors/database/databricks)
|
||||
- [Postgres](/connectors/database/postgres)
|
||||
|
||||
If you are using any other database connector, direct execution of lineage workflow is not possible. This is mainly because these database connectors does not maintain query execution logs which is required for lineage workflow. This documentation will help you to learn, how to execute the lineage workflow using a query log file for all the database connectors.
|
||||
|
||||
## Query Log File
|
||||
A query log file is a CSV file which contains the following information.
|
||||
|
||||
- **query:** This field contains the literal query that has been executed in the database.
|
||||
- **user_name (optional):** Enter the database user name which has executed this query.
|
||||
- **start_time (optional):** Enter the query execution start time in YYYY-MM-DD HH:MM:SS format.
|
||||
- **end_time (optional):** Enter the query execution end time in YYYY-MM-DD HH:MM:SS format.
|
||||
- **aborted (optional):** This field accepts values as true or false and indicates whether the query was aborted during execution
|
||||
- **database_name (optional):** Enter the database name on which the query was executed.
|
||||
- **schema_name (optional):** Enter the schema name to which the query is associated.
|
||||
|
||||
Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv).
|
||||
|
||||
## Lineage Workflow
|
||||
In order to run a Lineage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the lineage workflow.
|
||||
|
||||
### 1. Create a configuration file using template YAML
|
||||
Create a new file called `query_log_lineage.yaml` in the current directory. Note that the current directory should be the openmetadata directory.
|
||||
Copy and paste the configuration template below into the `query_log_lineage.yaml` the file you created.
|
||||
```yaml
|
||||
source:
|
||||
type: query-log-lineage
|
||||
serviceName: local_mysql
|
||||
serviceConnection:
|
||||
config:
|
||||
type: Mysql
|
||||
username: openmetadata_user
|
||||
password: openmetadata_password
|
||||
hostPort: localhost:3306
|
||||
connectionOptions: {}
|
||||
connectionArguments: {}
|
||||
sourceConfig:
|
||||
config:
|
||||
queryLogFilePath: <path to query log file>
|
||||
processor:
|
||||
type: query-parser
|
||||
config: {}
|
||||
stage:
|
||||
type: table-lineage
|
||||
config:
|
||||
filename: /tmp/query_log_lineage
|
||||
bulkSink:
|
||||
type: metadata-lineage
|
||||
config:
|
||||
filename: /tmp/query_log_lineage
|
||||
workflowConfig:
|
||||
openMetadataServerConfig:
|
||||
hostPort: <OpenMetadata host and port>
|
||||
authProvider: <OpenMetadata auth provider>
|
||||
```
|
||||
The `serviceName` and `serviceConnection` used in the above config has to be the same as used during Metadata Ingestion.
|
||||
The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json).
|
||||
- queryLogFilePath: Enter the file path of query log csv file.
|
||||
|
||||
### 2. Run with the CLI
|
||||
First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
|
||||
```yaml
|
||||
metadata ingest -c <path-to-yaml>
|
||||
```
|
||||
Note that from connector-to-connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.
|
||||
@ -4,7 +4,7 @@ slug: /connectors/ingestion/workflows/usage
|
||||
---
|
||||
|
||||
# Usage Workflow
|
||||
Learn how to configure the Usage workflow from the UI to ingest Query history and Lineage data from your data sources.
|
||||
Learn how to configure the Usage workflow from the UI to ingest Query history data from your data sources.
|
||||
|
||||
This workflow is available ONLY for the following connectors:
|
||||
- [BigQuery](/connectors/database/bigquery)
|
||||
@ -32,9 +32,9 @@ Learn how to do so 👇
|
||||
|
||||
## UI Configuration
|
||||
|
||||
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage and Entity Lineage information.
|
||||
Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage information.
|
||||
|
||||
This will populate the Queries and Lineage tab from the Table Entity Page.
|
||||
This will populate the Queries tab from the Table Entity Page.
|
||||
|
||||
<Image src="/images/openmetadata/ingestion/workflows/usage/table-entity-page.png" alt="table-entity-page" caption="Table Entity Page"/>
|
||||
|
||||
|
||||
@ -484,6 +484,8 @@ site_menu:
|
||||
url: /connectors/ingestion/workflows/usage/usage-workflow-query-logs
|
||||
- category: Connectors / Ingestion / Workflows / Lineage
|
||||
url: /connectors/ingestion/workflows/lineage
|
||||
- category: Connectors / Ingestion / Workflows / Lineage / Lineage Workflow Through Query Logs
|
||||
url: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs
|
||||
- category: Connectors / Ingestion / Workflows / Profiler
|
||||
url: /connectors/ingestion/workflows/profiler
|
||||
- category: Connectors / Ingestion / Workflows / Profiler / Metrics
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 206 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 116 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 210 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 225 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 176 KiB |
Loading…
x
Reference in New Issue
Block a user