lineage-docs (#13472)

This commit is contained in:
Shilpa Vernekar 2023-10-07 02:55:22 +05:30 committed by GitHub
parent aed9e3875f
commit 7cfcd56970
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
22 changed files with 286 additions and 2 deletions

View File

@ -0,0 +1,45 @@
---
title: How Column-Level Lineage Works
slug: /how-to-guides/openmetadata/data-lineage/column
---
# How Column-Level Lineage Works
OpenMetadata supports rich column-level lineage for understanding the relationship between tables and to perform impact analysis. Users can manually edit both the table and column level lineage to capture any information that is not automatically surfaced.
{% image
src="/images/v1.1/how-to-guides/lineage/lineage1.png"
alt="Column-Level Data Lineage in OpenMetadata"
caption="Column-Level Data Lineage in OpenMetadata"
/%}
{% note noteType="Tip" %} **Quick Tip:** Drilldown to view all the available columns for a table when viewing column-level lineage. {% /note %}
You can generate the column-level lineage automatically by running the **Lineage Ingestion**.
{% image
src="/images/v1.1/how-to-guides/lineage/ingestion.png"
alt="Lineage Ingestion"
caption="Lineage Ingestion"
/%}
## Manually Edit Column Level Lineage
OpenMetadata supports manual editing of both table and column level lineage. You can edit the lineage for the individual columns by clicking on the edit option on the top right. User the anchor points on either side of the columns to create links and trace individual columns through their lineage. You can also add new tables that have columns you want to trace. Connect the relevant columns to the current lineage.
{% image
src="/images/v1.1/how-to-guides/lineage/column1.png"
alt="Manually Edit Column Level Lineage"
caption="Manually Edit Column Level Lineage"
/%}
Watch the video on editing column-level lineage.
{% youtube videoId="HTkbTvi2H9c" start="0:00" end="00:51" /%}
{%inlineCallout
color="violet-70"
bold="Manual Lineage"
icon="MdArrowForward"
href="/how-to-guides/openmetadata/data-lineage/manual"%}
Edit the table and column level lineage manually.
{%/inlineCallout%}

View File

@ -0,0 +1,63 @@
---
title: Explore the Lineage View
slug: /how-to-guides/openmetadata/data-lineage/explore
---
# Explore the Lineage View
OpenMetadata UI displays end-to-end lineage traceability for the table and column levels. OpenMetadata supports lineage for Database, Dashboard, and Pipelines. Just search for an data asset and expand the graph to unfold lineage. Itll display the upstreams and downstreams edges for each node. The lineage details specify the SQL query, pipeline information, and column lineage.
In the lineage view, in the example below, the table on the left is the parent or **Source** node. The table on the right is the **Target** node. You can also identify the target node by looking at the arrow attached to it. The arrow connecting the data assets or tables is the **Edge**. Clicking on an edge connecting a source and a destination will display all the edge information: the Source, Target, Description, and SQL Query. It displays the SQL query used to generate the view (The table is of the Type View). The SQL query provides information on how the target table was generated from the source table.
{% image
src="/images/v1.1/how-to-guides/lineage/edge.png"
alt="Edge Information: Source and Target"
caption="Edge Information: Source and Target"
/%}
{% note noteType="Tip" %} **Tip:** Metadata ingestion also brings in the View Lineage, if the database has views (Data assets of the Type View). {% /note %}
You can set up the **Lineage Config** to display the required number of Upstream and Downstream Nodes, as well as the Nodes per layer. You can set up to **3** Upstream and Downstream Nodes.
{% image
src="/images/v1.1/how-to-guides/lineage/nodes.png"
alt="Lineage Config"
caption="Lineage Config"
/%}
You can click on the data assets to view the data asset details.
- Users can view the Source, Name of the Data Asset, Description, Owner (Team/User details), Tier, and Usage information for the data asset.
- Based on the **type of data asset** (Table, Topic, Dashboard, Pipeline, ML Model, Container), the quick preview provides additional information. For example, for `tables`, the type of table, the number of queries, and columns are displayed.
- The **data quality and profiler metrics** displays the details on the Tests Passed, Aborted, and Failed.
- Users can view all the **tags** associated with the data asset.
- The **Schema** provides the details on the column names, type of column, and column description.
{% image
src="/images/v1.1/how-to-guides/lineage/lineage2.png"
alt="Quick Glance at the Data Asset from Lineage View"
caption="Quick Glance at the Data Asset from Lineage View"
/%}
Clicking on the tables will display the list of columns and column-level lineage.
{% image
src="/images/v1.1/how-to-guides/lineage/lineage1.png"
alt="Column-Level Data Lineage in OpenMetadata"
caption="Column-Level Data Lineage in OpenMetadata"
/%}
In case of **Pipelines**, we first have the lineage ingested from the databases. Further, when setting up the pipeline ingestion, we specify the database service name. That way we display the lineage of the database tables connected via pipelines. If a lineage is created through a pipeline, the same is displayed in the Edge information.
{% image
src="/images/v1.1/how-to-guides/lineage/pipeline.png"
alt="Database and Pipeline Lineage"
caption="Database and Pipeline Lineage"
/%}
Similarly for a **Dashboard**, we first have the lineage ingested from the databases. Further, when setting up the dashboard ingestion, the data models and charts are ingested. That way we display the lineage of the database tables connected using the dashboard data models.
{%inlineCallout
color="violet-70"
bold="Column-Level Lineage"
icon="MdArrowForward"
href="/how-to-guides/openmetadata/data-lineage/column"%}
Explore and edit the rich column-level lineage.
{%/inlineCallout%}

View File

@ -5,8 +5,45 @@ slug: /how-to-guides/openmetadata/data-lineage
# Overview of Data Lineage
OpenMetadata tracks data lineage, showing how data moves through the organization's systems. Users can visualize how data is transformed and where it is used, helping with data traceability and impact analysis.
OpenMetadata tracks data lineage, showing how data moves through the organization's systems. Users can visualize how data is transformed and where it is used, helping with data traceability and impact analysis. OpenMetadata supports lineage for Database, Dashboard, and Pipelines.
{% image
src="/images/v1.1/how-to-guides/lineage/lineage1.png"
alt="Data Lineage in OpenMetadata"
caption="Data Lineage in OpenMetadata"
/%}
Watch the video on data lineage to understand the different options to automatically extract the lineage from your data warehouses such as Snowflake, dashboard service like metabase. Also learn about creating lineage programmatically with python SDK.
{% youtube videoId="jEbN1tt89H0" start="0:00" end="41:43" /%}
{% youtube videoId="jEbN1tt89H0" start="0:00" end="41:43" /%}
{%inlineCalloutContainer%}
{%inlineCallout
color="violet-70"
bold="Lineage Workflow"
icon="MdPolyline"
href="/how-to-guides/openmetadata/data-lineage/workflow"%}
Configure a lineage workflow right from the UI.
{%/inlineCallout%}
{%inlineCallout
color="violet-70"
bold="Explore Lineage"
icon="MdPolyline"
href="/how-to-guides/openmetadata/data-lineage/explore"%}
Explore the rich lineage view in OpenMetadata.
{%/inlineCallout%}
{%inlineCallout
color="violet-70"
bold="Column-Level Lineage"
icon="MdViewColumn"
href="/how-to-guides/openmetadata/data-lineage/column"%}
Explore and edit the rich column-level lineage.
{%/inlineCallout%}
{%inlineCallout
color="violet-70"
bold="Manual Lineage"
icon="MdPolyline"
href="/how-to-guides/openmetadata/data-lineage/manual"%}
Edit the table and column level lineage manually.
{%/inlineCallout%}
{%/inlineCalloutContainer%}

View File

@ -0,0 +1,42 @@
---
title: How to Manually Add or Edit Lineage
slug: /how-to-guides/openmetadata/data-lineage/manual
---
# How to Manually Add or Edit Lineage
Edit lineage to provide a richer understanding of the provenance of data. The OpenMetadata no-code editor provides a drag and drop interface. Drop tables, topics, pipelines, dashboards, ML models, containers, and pipelines onto the lineage graph. You may add new edges or delete existing edges to better represent data lineage.
OpenMetadata supports manual editing of both table and column level lineage. We can build the lineage by creating edges. You can connect the source of the lineage to the destination by connecting the nodes.
Once you have ingested your database and dashboard services.
- Start by picking one database service, and select a table. In the data asset details page, navigate to the Lineage Tab.
- Click on the Edit option to enable the lineage editor.
- Select the type of data asset (table, topic, dashboard, ML model, container, pipeline) to connect to as the destination.
{% image
src="/images/v1.1/how-to-guides/lineage/l1.png"
alt="Data Asset: Lineage Tab"
caption="Data Asset: Lineage Tab"
/%}
- Search and select the relevant data asset.
- Create an edge between these two data assets.
{% image
src="/images/v1.1/how-to-guides/lineage/l2.png"
alt="Link the Table to the Dashboard to Add Lineage Manually"
caption="Link the Table to the Dashboard to Add Lineage Manually"
/%}
- You can also expand a table to view the available columns
- Link the relevant columns together by connecting the column edges to trace column-level lineage.
{% image
src="/images/v1.1/how-to-guides/lineage/l3.png"
alt="Column-Level Lineage"
caption="Column-Level Lineage"
/%}
Watch the video about lineage (13:30 to 15:50)
{% youtube videoId="jEbN1tt89H0" start="13:30" end="15:48" /%}

View File

@ -0,0 +1,89 @@
---
title: How to Deploy a Lineage Workflow
slug: /how-to-guides/openmetadata/data-lineage/workflow
---
# How to Deploy a Lineage Workflow
Lineage data can be ingested from your data sources right from the OpenMetadata UI. Currently, the lineage workflow is supported for a limited set of connectors, like [BigQuery](/connectors/database/bigquery), [Snowflake](/connectors/database/snowflake), [MSSQL](/connectors/database/mssql), [Redshift](/connectors/database/redshift), [Clickhouse](/connectors/database/clickhouse), [Postgres](/connectors/database/postgres), [Databricks](/connectors/database/databricks).
{% note noteType="Tip" %} **Tip:** Trace the upstream and downstream dependencies with Lineage. {% /note %}
## View Lineage from Metadata Ingestion
Once the metadata ingestion runs correctly, and we are able to explore the service Entities, we can add the view lineage information for the data assets. This will populate the Lineage tab in the data asset page. During the Metadata Ingestion workflow we differentiate if a Table is a View. For those sources, where we can obtain the query that generates the View, we bring in the view lineage along with the metadata. After all Tables have been ingested in the workflow, it's time to parse all the queries generating Views. During the query parsing, we will obtain the source and target tables, search if the Tables exist in OpenMetadata, and finally create the lineage relationship between the involved Entities.
If the database has views, then the view lineage would be generated automatically, along with the column-level lineage. In such a case, the table type is **View** as shown in the example below.
{% image
src="/images/v1.1/how-to-guides/lineage/view.png"
alt="View Lineage through Metadata Ingestion"
caption="View Lineage through Metadata Ingestion"
/%}
## Lineage Ingestion from UI
Apart from the Metadata ingestion, we can create a workflow that will obtain the query log and table creation information from the underlying database and feed it to OpenMetadata. The Lineage Ingestion will be in charge of obtaining this data. The metadata ingestion will only bring in the View lineage queries, whereas the lineage ingestion workflow will be bring in all those queries that can be used to generate lineage information.
### 1. Add a Lineage Ingestion
Navigate to **Settings >> Services**. Select the required service
{% image
src="/images/v1.1/how-to-guides/lineage/wkf1.png"
alt="Select a Service"
caption="Select a Service"
/%}
Go the the **Ingestions** tab. Click on **Add Ingestion** and select **Add Lineage Ingestion**.
{% image
src="/images/v1.1/how-to-guides/lineage/wkf2.png"
alt="Add a Lineage Ingestion"
caption="Add a Lineage Ingestion"
/%}
### 2. Configure the Lineage Ingestion
Here you can enter the Lineage Ingestion details:
{% image
src="/images/v1.1/how-to-guides/lineage/wkf3.png"
alt="Configure the Lineage Ingestion"
caption="Configure the Lineage Ingestion"
/%}
### Lineage Options
**Query Log Duration:** Specify the duration in days for which the profiler should capture lineage data from the query logs. For example, if you specify 2 as the value for the duration, the data profiler will capture lineage information for 2 **days** or 48 hours prior to when the ingestion workflow is run.
**Parsing Timeout Limit:** Specify the timeout limit for parsing the sql queries to perform the lineage analysis. This must be specified in **seconds**.
**Result Limit:** Set the limit for the query log results to be run at a time. This is the **number of rows**.
**Filter Condition:** We execute a query on query history table of the respective data source to perform the query analysis and extract the lineage and usage information. This field will be useful when you want to restrict some queries from being part of this analysis. In this field you can specify a sql condition that will be applied on the query history result set. You can check more about [Usage Query Filtering here](/connectors/ingestion/workflows/usage/filter-query-set).
### 3. Schedule and Deploy
After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions.
{% image
src="/images/v1.1/how-to-guides/lineage/wkf4.png"
alt="Schedule and Deploy the Lineage Ingestion"
caption="Schedule and Deploy the Lineage Ingestion"
/%}
## dbt Ingestion
We can also generate lineage through [dbt ingestion](/connectors/ingestion/workflows/dbt/ingest-dbt-ui). The dbt workflow can fetch queries that carry lineage information. For a dbt ingestion pipeline, the path to the Catalog and Manifest files must be specified. We also fetch the column level lineage through dbt.
You can learn more about [lineage ingestion here](/connectors/ingestion/lineage).
## Query Logs using CSV File
Lineage ingestion is supported for a few connectors as mentioned earlier. For the unsupported connectors, you can set up [Lineage Workflows using Query Logs](/connectors/ingestion/workflows/lineage/lineage-workflow-query-logs) using a CSV file.
## Manual Lineage
Lineage can also be added and edited manually in OpenMetadata. Refer for more information on [adding lineage manually](/how-to-guides/openmetadata/data-lineage/manual).
{%inlineCallout
color="violet-70"
bold="Explore Lineage"
icon="MdArrowForward"
href="/how-to-guides/openmetadata/data-lineage/explore"%}
Explore the rich lineage view in OpenMetadata.
{%/inlineCallout%}

View File

@ -623,6 +623,14 @@ site_menu:
url: /how-to-guides/openmetadata/data-quality-profiler
- category: How to Guides / The Six Pillars of OpenMetadata / Data Lineage
url: /how-to-guides/openmetadata/data-lineage
- category: How to Guides / The Six Pillars of OpenMetadata / Data Lineage / How to Deploy a Lineage Workflow
url: /how-to-guides/openmetadata/data-lineage/workflow
- category: How to Guides / The Six Pillars of OpenMetadata / Data Lineage / Explore the Lineage View
url: /how-to-guides/openmetadata/data-lineage/explore
- category: How to Guides / The Six Pillars of OpenMetadata / Data Lineage / How Column-Level Lineage Works
url: /how-to-guides/openmetadata/data-lineage/column
- category: How to Guides / The Six Pillars of OpenMetadata / Data Lineage / How to Manually Add or Edit Lineage
url: /how-to-guides/openmetadata/data-lineage/manual
- category: How to Guides / The Six Pillars of OpenMetadata / Data Insights
url: /how-to-guides/openmetadata/data-insights
- category: How to Guides / The Six Pillars of OpenMetadata / Data Governance

Binary file not shown.

After

Width:  |  Height:  |  Size: 739 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 379 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 619 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 727 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 702 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 793 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 990 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 950 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 951 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 948 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 667 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 770 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 639 KiB