diff --git a/openmetadata-docs/content/connectors/ingestion/workflows/lineage/index.md b/openmetadata-docs/content/connectors/ingestion/workflows/lineage/index.md index 3293d947f19..95de9871c72 100644 --- a/openmetadata-docs/content/connectors/ingestion/workflows/lineage/index.md +++ b/openmetadata-docs/content/connectors/ingestion/workflows/lineage/index.md @@ -4,5 +4,66 @@ slug: /connectors/ingestion/workflows/lineage --- # Lineage Workflow +Learn how to configure the Lineage workflow from the UI to ingest Lineage data from your data sources. -Introduced in 0.12 +This workflow is available ONLY for the following connectors: +- [BigQuery](/connectors/database/bigquery) +- [Snowflake](/connectors/database/snowflake) +- [MSSQL](/connectors/database/mssql) +- [Redshift](/connectors/database/redshift) +- [Clickhouse](/connectors/database/clickhouse) +- [Postgres](/connectors/database/postgres) +- [Databricks](/connectors/database/databricks) + +If your database service is not yet supported, you can use this same workflow by providing a Query Log file! + +Learn how to do so 👇 + + + + Configure the lineage workflow by providing a Query Log file. + + + +## UI Configuration + +Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Entity Lineage information. + +This will populate the Lineage tab from the Table Entity Page. + +table-entity-page + +We can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data. + +### 1. Add a Lineage Ingestion + +From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add Lineage Ingestion. + +add-ingestion + +### 2. Configure the Lineage Ingestion + +Here you can enter the Lineage Ingestion details: + +configure-lineage-ingestion + + + +**Query Log Duration** + +Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 48 hours prior to when the ingestion workflow is run. + +**Result Limit** + +Set the limit for the query log results to be run at a time. + + +### 3. Schedule and Deploy +After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions. + +schedule-and-deploy \ No newline at end of file diff --git a/openmetadata-docs/content/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs.md b/openmetadata-docs/content/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs.md new file mode 100644 index 00000000000..3485ff93ba9 --- /dev/null +++ b/openmetadata-docs/content/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs.md @@ -0,0 +1,78 @@ +--- +title: Lineage Workflow Through Query Logs +slug: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs +--- + +# Lineage Workflow Through Query Logs + +The following database connectors supports lineage workflow in OpenMetadata: +- [BigQuery](/connectors/database/bigquery) +- [Snowflake](/connectors/database/snowflake) +- [MSSQL](/connectors/database/mssql) +- [Redshift](/connectors/database/redshift) +- [Clickhouse](/connectors/database/clickhouse) +- [Databricks](/connectors/database/databricks) +- [Postgres](/connectors/database/postgres) + +If you are using any other database connector, direct execution of lineage workflow is not possible. This is mainly because these database connectors does not maintain query execution logs which is required for lineage workflow. This documentation will help you to learn, how to execute the lineage workflow using a query log file for all the database connectors. + +## Query Log File +A query log file is a CSV file which contains the following information. + +- **query:** This field contains the literal query that has been executed in the database. +- **user_name (optional):** Enter the database user name which has executed this query. +- **start_time (optional):** Enter the query execution start time in YYYY-MM-DD HH:MM:SS format. +- **end_time (optional):** Enter the query execution end time in YYYY-MM-DD HH:MM:SS format. +- **aborted (optional):** This field accepts values as true or false and indicates whether the query was aborted during execution +- **database_name (optional):** Enter the database name on which the query was executed. +- **schema_name (optional):** Enter the schema name to which the query is associated. + +Checkout a sample query log file [here](https://github.com/open-metadata/OpenMetadata/blob/main/ingestion/examples/sample_data/glue/query_log.csv). + +## Lineage Workflow +In order to run a Lineage Workflow we need to make sure that Metadata Ingestion Workflow for corresponding service has already been executed. We will follow the steps to create a JSON configuration able to collect the query log file and execute the lineage workflow. + +### 1. Create a configuration file using template YAML +Create a new file called `query_log_lineage.yaml` in the current directory. Note that the current directory should be the openmetadata directory. +Copy and paste the configuration template below into the `query_log_lineage.yaml` the file you created. +```yaml +source: + type: query-log-lineage + serviceName: local_mysql + serviceConnection: + config: + type: Mysql + username: openmetadata_user + password: openmetadata_password + hostPort: localhost:3306 + connectionOptions: {} + connectionArguments: {} + sourceConfig: + config: + queryLogFilePath: +processor: + type: query-parser + config: {} +stage: + type: table-lineage + config: + filename: /tmp/query_log_lineage +bulkSink: + type: metadata-lineage + config: + filename: /tmp/query_log_lineage +workflowConfig: + openMetadataServerConfig: + hostPort: + authProvider: +``` +The `serviceName` and `serviceConnection` used in the above config has to be the same as used during Metadata Ingestion. +The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceQueryLineagePipeline.json). +- queryLogFilePath: Enter the file path of query log csv file. + +### 2. Run with the CLI +First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: +```yaml +metadata ingest -c +``` +Note that from connector-to-connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources. diff --git a/openmetadata-docs/content/connectors/ingestion/workflows/usage/index.md b/openmetadata-docs/content/connectors/ingestion/workflows/usage/index.md index 08645b5eb38..491b5a8defc 100644 --- a/openmetadata-docs/content/connectors/ingestion/workflows/usage/index.md +++ b/openmetadata-docs/content/connectors/ingestion/workflows/usage/index.md @@ -4,7 +4,7 @@ slug: /connectors/ingestion/workflows/usage --- # Usage Workflow -Learn how to configure the Usage workflow from the UI to ingest Query history and Lineage data from your data sources. +Learn how to configure the Usage workflow from the UI to ingest Query history data from your data sources. This workflow is available ONLY for the following connectors: - [BigQuery](/connectors/database/bigquery) @@ -32,9 +32,9 @@ Learn how to do so 👇 ## UI Configuration -Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage and Entity Lineage information. +Once the metadata ingestion runs correctly and we are able to explore the service Entities, we can add Query Usage information. -This will populate the Queries and Lineage tab from the Table Entity Page. +This will populate the Queries tab from the Table Entity Page. table-entity-page diff --git a/openmetadata-docs/content/menu.md b/openmetadata-docs/content/menu.md index 1a79165fa5b..3e4a6a898a2 100644 --- a/openmetadata-docs/content/menu.md +++ b/openmetadata-docs/content/menu.md @@ -484,6 +484,8 @@ site_menu: url: /connectors/ingestion/workflows/usage/usage-workflow-query-logs - category: Connectors / Ingestion / Workflows / Lineage url: /connectors/ingestion/workflows/lineage + - category: Connectors / Ingestion / Workflows / Lineage / Lineage Workflow Through Query Logs + url: /connectors/ingestion/workflows/lineage/lineage-workflow-query-logs - category: Connectors / Ingestion / Workflows / Profiler url: /connectors/ingestion/workflows/profiler - category: Connectors / Ingestion / Workflows / Profiler / Metrics diff --git a/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/add-ingestion.png b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/add-ingestion.png new file mode 100644 index 00000000000..a8bd5535c31 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/add-ingestion.png differ diff --git a/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png new file mode 100644 index 00000000000..0cc40069183 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/configure-lineage-ingestion.png differ diff --git a/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png new file mode 100644 index 00000000000..6e6624aae14 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/scheule-and-deploy.png differ diff --git a/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/table-entity-page.png b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/table-entity-page.png new file mode 100644 index 00000000000..403444f408f Binary files /dev/null and b/openmetadata-docs/images/openmetadata/ingestion/workflows/lineage/table-entity-page.png differ diff --git a/openmetadata-docs/images/openmetadata/ingestion/workflows/usage/table-entity-page.png b/openmetadata-docs/images/openmetadata/ingestion/workflows/usage/table-entity-page.png index b80ac1690e1..35e0efef045 100644 Binary files a/openmetadata-docs/images/openmetadata/ingestion/workflows/usage/table-entity-page.png and b/openmetadata-docs/images/openmetadata/ingestion/workflows/usage/table-entity-page.png differ