mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-26 17:37:33 +00:00
docs(lineage): Updating Lineage feature guide (#6257)
This commit is contained in:
parent
91c82fa5db
commit
0426f5cd4a
@ -197,9 +197,7 @@ module.exports = {
|
||||
"docs/domains",
|
||||
"docs/glossary/business-glossary",
|
||||
"docs/tags",
|
||||
{
|
||||
Lineage: ["docs/lineage/intro", "docs/lineage/sample_code"],
|
||||
},
|
||||
"docs/lineage/lineage-feature-guide",
|
||||
],
|
||||
|
||||
"Act on Metadata": [
|
||||
|
||||
@ -90,4 +90,4 @@ We currently limit the list of dependencies to 10,000 records; we suggest applyi
|
||||
|
||||
### Related Features
|
||||
|
||||
* [DataHub Lineage](../lineage/intro.md)
|
||||
* [DataHub Lineage](../lineage/lineage-feature-guide.md)
|
||||
|
||||
@ -1,3 +0,0 @@
|
||||
# Introduction to Lineage
|
||||
|
||||
See [this video](https://www.youtube.com/watch?v=rONGpsndzRw&ab_channel=DataHub) for Lineage 101 in DataHub.
|
||||
@ -6,11 +6,9 @@ import FeatureAvailability from '@site/src/components/FeatureAvailability';
|
||||
|
||||
Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.
|
||||
|
||||
If you're using an ingestion source that supports extraction of Lineage (e.g. the "Table Lineage Capability"), then lineage information can be extracted automatically. For detailed instructions, refer to the source documentation for the source you are using.
|
||||
If you're using an ingestion source that supports extraction of Lineage (e.g. the "Table Lineage Capability"), then lineage information can be extracted automatically. For detailed instructions, refer to the source documentation for the source you are using. If you are not using a Lineage-support ingestion source, you can programmatically emit lineage edges between entities via API.
|
||||
|
||||
If you are not using a Lineage-support ingestion source, you can also manage lineage connections by hand inside the DataHub web application. The remainder of this guide will focus on managing Lineage as done within DataHub directly.
|
||||
|
||||
Starting in `v0.9.5`, DataHub supports the manual editing of lineage between entities. Data experts are free to add or remove upstream and downstream lineage edges in both the Lineage Visualization screen as well as the Lineage tab on entity pages. Use this feature to supplement automatic lineage extraction or establish important entity relationships in sources that do not support automatic extraction. Editing lineage by hand is supported for Datasets, Charts, Dashboards, and Data Jobs.
|
||||
Alternatively, as of `v0.9.5`, DataHub supports the manual editing of lineage between entities. Data experts are free to add or remove upstream and downstream lineage edges in both the Lineage Visualization screen as well as the Lineage tab on entity pages. Use this feature to supplement automatic lineage extraction or establish important entity relationships in sources that do not support automatic extraction. Editing lineage by hand is supported for Datasets, Charts, Dashboards, and Data Jobs.
|
||||
|
||||
:::note
|
||||
|
||||
@ -18,6 +16,14 @@ Lineage added by hand and programmatically may conflict with one another to caus
|
||||
|
||||
:::
|
||||
|
||||
Types of lineage connections supported in DataHub are:
|
||||
|
||||
* Dataset-to-dataset
|
||||
* Pipeline lineage (dataset-to-job-to-dataset)
|
||||
* Dashboard-to-chart lineage
|
||||
* Chart-to-dataset lineage
|
||||
* Job-to-dataflow (dbt lineage)
|
||||
|
||||
## Lineage Setup, Prerequisites, and Permissions
|
||||
|
||||
To edit lineage for an entity, you'll need the following [Metadata Privilege](../authorization/policies.md):
|
||||
@ -26,7 +32,7 @@ To edit lineage for an entity, you'll need the following [Metadata Privilege](..
|
||||
|
||||
It is important to know that the **Edit Lineage** privilege is required for all entities whose lineage is affected by the changes. For example, in order to add "Dataset B" as an upstream dependency of "Dataset A", you'll need the **Edit Lineage** privilege for both Dataset A and Dataset B.
|
||||
|
||||
## Using Lineage
|
||||
## Managing Lineage via the DataHub UI
|
||||
|
||||
### Editing from Lineage Graph View
|
||||
|
||||
@ -82,6 +88,68 @@ The other place that you can edit lineage for entities is from the Lineage Tab o
|
||||
|
||||
Using the modal from this view will work the same as described above for editing from the Lineage Visualization screen.
|
||||
|
||||
## Managing Lineage via API
|
||||
|
||||
:::note
|
||||
|
||||
When you emit any lineage aspect, the existing aspect gets completely overwritten.
|
||||
|
||||
:::
|
||||
|
||||
### Using Dataset-to-Dataset Lineage
|
||||
|
||||
This relationship model uses dataset -> dataset connection through the UpstreamLineage aspect in the Dataset entity.
|
||||
|
||||
Here are a few samples for the usage of this type of lineage:
|
||||
|
||||
* [lineage_emitter_mcpw_rest.py](../../metadata-ingestion/examples/library/lineage_emitter_mcpw_rest.py) - emits simple bigquery table-to-table (dataset-to-dataset) lineage via REST as MetadataChangeProposalWrapper.
|
||||
* [lineage_emitter_rest.py](../../metadata-ingestion/examples/library/lineage_emitter_rest.py) - emits simple dataset-to-dataset lineage via REST as MetadataChangeEvent.
|
||||
* [lineage_emitter_kafka.py](../../metadata-ingestion/examples/library/lineage_emitter_kafka.py) - emits simple dataset-to-dataset lineage via Kafka as MetadataChangeEvent.
|
||||
* [lineage_emitter_dataset_finegrained.py](../../metadata-ingestion/examples/library/lineage_emitter_dataset_finegrained.py) - emits fine-grained dataset-dataset lineage via REST as MetadataChangeProposalWrapper.
|
||||
* [Datahub BigQuery Lineage](https://github.com/datahub-project/datahub/blob/a1bf95307b040074c8d65ebb86b5eb177fdcd591/metadata-ingestion/src/datahub/ingestion/source/sql/bigquery.py#L229) - emits Datahub's Bigquery lineage as MetadataChangeProposalWrapper.
|
||||
* [Datahub Snowflake Lineage](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/snowflake.py#L249) - emits Datahub's Snowflake lineage as MetadataChangeProposalWrapper.
|
||||
|
||||
### Using dbt Lineage
|
||||
|
||||
This model captures dbt specific nodes (tables, views, etc.) and
|
||||
|
||||
* uses datasets as the base entity type and
|
||||
* extends subclass datasets for each dbt-specific concept, and
|
||||
* links them together for dataset-to-dataset lineage
|
||||
|
||||
Here is a sample usage of this lineage:
|
||||
|
||||
* [Datahub dbt Lineage](https://github.com/datahub-project/datahub/blob/a9754ebe83b6b73bc2bfbf49d9ebf5dbd2ca5a8f/metadata-ingestion/src/datahub/ingestion/source/dbt.py#L625,L630) - emits Datahub's dbt lineage as MetadataChangeEvent.
|
||||
|
||||
### Using Pipeline Lineage
|
||||
|
||||
The relationship model for this is datajob-to-dataset through the dataJobInputOutput aspect in the DataJob entity.
|
||||
|
||||
For Airflow, this lineage is supported using Airflow’s lineage backend which allows you to specify the inputs to and output from that task.
|
||||
|
||||
If you annotate that on your task we can pick up that information and push that as lineage edges into datahub automatically. You can install this package from Airflow’s Astronomer marketplace [here](https://registry.astronomer.io/providers/datahub).
|
||||
|
||||
Here are a few samples for the usage of this type of lineage:
|
||||
|
||||
* [lineage_dataset_job_dataset.py](../../metadata-ingestion/examples/library/lineage_dataset_job_dataset.py) - emits mysql-to-airflow-to-kafka (dataset-to-job-to-dataset) lineage via REST as MetadataChangeProposalWrapper.
|
||||
* [lineage_job_dataflow.py](../../metadata-ingestion/examples/library/lineage_job_dataflow.py) - emits the job-to-dataflow lineage via REST as MetadataChangeProposalWrapper.
|
||||
|
||||
### Using Dashboard-to-Chart Lineage
|
||||
|
||||
This relationship model uses the dashboardInfo aspect of the Dashboard entity and models an explicit edge between a dashboard and a chart (such that charts can be attached to multiple dashboards).
|
||||
|
||||
Here is a sample usage of this lineage:
|
||||
|
||||
* [lineage_chart_dashboard.py](../../metadata-ingestion/examples/library/lineage_chart_dashboard.py) - emits the chart-to-dashboard lineage via REST as MetadataChangeProposalWrapper.
|
||||
|
||||
### Using Chart-to-Dataset Lineage
|
||||
|
||||
This relationship model uses the chartInfo aspect of the Chart entity.
|
||||
|
||||
Here is a sample usage of this lineage:
|
||||
|
||||
* [lineage_dataset_chart.py](../../metadata-ingestion/examples/library/lineage_dataset_chart.py) - emits the dataset-to-chart lineage via REST as MetadataChangeProposalWrapper.
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Videos
|
||||
@ -89,7 +157,7 @@ Using the modal from this view will work the same as described above for editing
|
||||
**DataHub Basics: Lineage 101**
|
||||
|
||||
<p align="center">
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/rONGpsndzRw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/rONGpsndzRw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
||||
</p>
|
||||
|
||||
**DataHub November 2022 Town Hall - Including Manual Lineage Demo**
|
||||
@ -102,6 +170,7 @@ Using the modal from this view will work the same as described above for editing
|
||||
|
||||
* [updateLineage](../../graphql/mutations.md#updatelineage)
|
||||
* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)
|
||||
* [searchAcrossLineageInput](../../graphql/inputObjects.md#searchacrosslineageinput)
|
||||
|
||||
#### Examples
|
||||
|
||||
@ -126,8 +195,24 @@ mutation updateLineage {
|
||||
}
|
||||
```
|
||||
|
||||
### DataHub Blog
|
||||
|
||||
* [Acryl Data introduces lineage support and automated propagation of governance information for Snowflake in DataHub](https://blog.datahubproject.io/acryl-data-introduces-lineage-support-and-automated-propagation-of-governance-information-for-339c99536561)
|
||||
* [Data in Context: Lineage Explorer in DataHub](https://blog.datahubproject.io/data-in-context-lineage-explorer-in-datahub-a53a9a476dc4)
|
||||
* [Harnessing the Power of Data Lineage with DataHub](https://blog.datahubproject.io/harnessing-the-power-of-data-lineage-with-datahub-ad086358dec4)
|
||||
|
||||
## FAQ and Troubleshooting
|
||||
|
||||
**The Lineage Tab is greyed out - why can’t I click on it?**
|
||||
|
||||
This means you have not yet ingested lineage metadata for that entity. Please ingest lineage to proceed.
|
||||
|
||||
**Are there any recommended practices for emitting lineage?**
|
||||
|
||||
We recommend emitting aspects as MetadataChangeProposalWrapper over emitting them via the MetadataChangeEvent.
|
||||
|
||||
*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*
|
||||
|
||||
### Related Features
|
||||
|
||||
* [Lineage](./intro.md)
|
||||
* [DataHub Lineage Impact Analysis](../act-on-metadata/impact-analysis.md)
|
||||
@ -1,21 +0,0 @@
|
||||
# Lineage sample code
|
||||
|
||||
The following samples will cover emitting dataset-to-dataset, dataset-to-job-to-dataset, chart-to-dataset, dashboard-to-chart and job-to-dataflow lineages.
|
||||
- [lineage_emitter_mcpw_rest.py](../../metadata-ingestion/examples/library/lineage_emitter_mcpw_rest.py) - emits simple bigquery table-to-table (dataset-to-dataset) lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_dataset_job_dataset.py](../../metadata-ingestion/examples/library/lineage_dataset_job_dataset.py) - emits mysql-to-airflow-to-kafka (dataset-to-job-to-dataset) lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_dataset_chart.py](../../metadata-ingestion/examples/library/lineage_dataset_chart.py) - emits the dataset-to-chart lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_chart_dashboard.py](../../metadata-ingestion/examples/library/lineage_chart_dashboard.py) - emits the chart-to-dashboard lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_job_dataflow.py](../../metadata-ingestion/examples/library/lineage_job_dataflow.py) - emits the job-to-dataflow lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_emitter_rest.py](../../metadata-ingestion/examples/library/lineage_emitter_rest.py) - emits simple dataset-to-dataset lineage via REST as MetadataChangeEvent.
|
||||
- [lineage_emitter_kafka.py](../../metadata-ingestion/examples/library/lineage_emitter_kafka.py) - emits simple dataset-to-dataset lineage via Kafka as MetadataChangeEvent.
|
||||
- [lineage_emitter_dataset_finegrained.py](../../metadata-ingestion/examples/library/lineage_emitter_dataset_finegrained.py) - emits fine-grained dataset-dataset lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [lineage_emitter_datajob_finegrained.py](../../metadata-ingestion/examples/library/lineage_emitter_datajob_finegrained.py) - emits fine-grained datajob-dataset lineage via REST as MetadataChangeProposalWrapper.
|
||||
- [Datahub Snowflake Lineage](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/snowflake.py#L249) - emits Datahub's Snowflake lineage as MetadataChangeProposalWrapper.
|
||||
- [Datahub Bigquery Lineage](https://github.com/datahub-project/datahub/blob/a1bf95307b040074c8d65ebb86b5eb177fdcd591/metadata-ingestion/src/datahub/ingestion/source/sql/bigquery.py#L229) - emits Datahub's Bigquery lineage as MetadataChangeProposalWrapper.
|
||||
- [Datahub Dbt Lineage](https://github.com/datahub-project/datahub/blob/a9754ebe83b6b73bc2bfbf49d9ebf5dbd2ca5a8f/metadata-ingestion/src/datahub/ingestion/source/dbt.py#L625,L630) - emits Datahub's DBT lineage as MetadataChangeEvent.
|
||||
|
||||
NOTE:
|
||||
- Emitting aspects as MetadataChangeProposalWrapper is recommended over emitting aspects via the
|
||||
MetadataChangeEvent.
|
||||
- Emitting any aspect associated with an entity completely overwrites the previous
|
||||
value of the aspect associated with the entity. This means that emitting a lineage aspect associated with a dataset will overwrite lineage edges that already exist.
|
||||
Loading…
x
Reference in New Issue
Block a user