datahub/docs/api/tutorials/lineage.md

# Lineage

DataHub’s Python SDK allows you to programmatically define and retrieve lineage between metadata entities. With the DataHub Lineage SDK, you can:

- Add **table-level and column-level lineage** across datasets, data jobs, dashboards, and charts
- Automatically **infer lineage from SQL queries**
- **Read lineage** (upstream or downstream) for a given entity or column
- **Filter lineage results** using structured filters

## Getting Started

To use DataHub SDK, you'll need to install [`acryl-datahub`](https://pypi.org/project/acryl-datahub/) and set up a connection to your DataHub instance. Follow the [installation guide](https://docs.datahub.com/docs/metadata-ingestion/cli-ingestion#installing-datahub-cli) to get started.

Connect to your DataHub instance:

```python
from datahub.sdk import DataHubClient

client = DataHubClient(server="<your_server>", token="<your_token>")
```

- **server**: The URL of your DataHub GMS server
  - local: `http://localhost:8080`
  - hosted: `https://<your_datahub_url>/gms`
- **token**: You'll need to [generate a Personal Access Token](https://docs.datahub.com/docs/authentication/personal-access-tokens) from your DataHub instance.

## Add Lineage

The `add_lineage()` method allows you to define lineage between two entities.

### Add Entity Lineage

You can create lineage between two datasets, data jobs, dashboards, or charts. The `upstream` and `downstream` parameters should be the URNs of the entities you want to link.

#### Add Entity Lineage Between Datasets

```python
{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset.py show_path_as_comment }}
```

#### Add Entity Lineage Between Datajobs

```python
{{ inline /metadata-ingestion/examples/library/lineage_datajob_to_datajob.py show_path_as_comment }}
```

:::note Lineage Combinations
For supported lineage combinations, see [Supported Lineage Combinations](#supported-lineage-combinations).
:::

### Add Column Lineage

You can add column-level lineage by using `column_lineage` parameter when linking datasets.

#### Add Column Lineage with Fuzzy Matching

```python
{{ inline /metadata-ingestion/examples/library/lineage_dataset_column.py show_path_as_comment }}
```

When `column_lineage` is set to **True**, DataHub will automatically map columns based on their names, allowing for fuzzy matching. This is useful when upstream and downstream datasets have similar but not identical column names. (e.g. `customer_id` in upstream and `CustomerId` in downstream).

#### Add Column Lineage with Strict Matching

```python
{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_auto_strict.py show_path_as_comment }}
```

This will create column-level lineage with strict matching, meaning the column names must match exactly between upstream and downstream datasets.

#### Add Column Lineage with Custom Mapping

For custom mapping, you can use a dictionary where keys are downstream column names and values represent lists of upstream column names. This allows you to specify complex relationships.

```python
{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_custom_mapping.py show_path_as_comment }}
```

### Infer Lineage from SQL

You can infer lineage directly from a SQL query using `infer_lineage_from_sql()`. This will parse the query, determine upstream and downstream datasets, and automatically add lineage (including column-level lineage when possible).

```python
{{ inline /metadata-ingestion/examples/library/lineage_dataset_from_sql.py show_path_as_comment }}
```

:::note DataHub SQL Parser

Check out more information on how we handle SQL parsing below.

- [The DataHub SQL Parser Documentation](../../lineage/sql_parsing.md)
- [Blog Post : Extracting Column-Level Lineage from SQL](https://medium.com/datahub-project/extracting-column-level-lineage-from-sql-779b8ce17567)

:::

### Add Query Node with Lineage

If you provide a `transformation_text` to `add_lineage`, DataHub will create a query node that represents the transformation logic. This is useful for tracking how data is transformed between datasets.

```python
{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset_with_query_node.py show_path_as_comment }}
```

Transformation text can be any transformation logic, Python scripts, Airflow DAG code, or any other code that describes how the upstream dataset is transformed into the downstream dataset.

<p align="center">
  <img width="80%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/query-node.png"/>
</p>

:::note
Providing `transformation_text` will NOT create column lineage. You need to specify `column_lineage` parameter to enable column-level lineage.

If you have a SQL query that describes the transformation, you can use [infer_lineage_from_sql](#infer-lineage-from-sql) to automatically parse the query and add column level lineage.
:::

## Get Lineage

The `get_lineage()` method allows you to retrieve lineage for a given entity.

### Get Entity Lineage

#### Get Upstream Lineage for a Dataset

This will return the direct upstream entity that the dataset depends on. By default, it retrieves only the immediate upstream entities (1 hop).

```python
{{ inline /metadata-ingestion/examples/library/get_lineage_basic.py show_path_as_comment }}
```

#### Get Downstream Lineage for a Dataset Across Multiple Hops

To get upstream/downstream entities that are more than one hop away, you can use the `max_hops` parameter. This allows you to traverse the lineage graph up to a specified number of hops.

```python
{{ inline /metadata-ingestion/examples/library/get_lineage_with_hops.py show_path_as_comment }}

```

:::note USING MAX_HOPS
if you provide `max_hops` greater than 2, it will traverse the full lineage graph and limit the results by `count`.
:::

#### Return Type

`get_lineage()` returns a list of `LineageResult` objects.

```python
results = [
  LineageResult(
    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
    type="DATASET",
    hops=1,
    direction="downstream",
    platform="snowflake",
    name="table_2", # name of the entity
    paths=[] # Only populated for column-level lineage
  )
]
```

### Get Column-Level Lineage

#### Get Downstream Lineage for a Dataset Column

You can retrieve column-level lineage by specifying the `source_column` parameter. This will return lineage paths that include the specified column.

```python
{{ inline /metadata-ingestion/examples/library/get_column_lineage.py show_path_as_comment }}
```

You can also pass `SchemaFieldUrn` as the `source_urn` to get column-level lineage.

```python
{{ inline /metadata-ingestion/examples/library/get_column_lineage_from_schemafield.py show_path_as_comment }}

```

#### Return type

The return type is the same as for entity lineage, but with additional `paths` field that contains column lineage paths.

```python
results = [
  LineageResult(
    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
    type="DATASET",
    hops=1,
    direction="downstream",
    platform="snowflake",
    name="table_2", # name of the entity
    paths=[
      LineagePath(
        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD),col1)",
        column_name="col1", # name of the column
        entity_name="table_1", # name of the entity that contains the column
      ),
      LineagePath(
        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD),col4)",
        column_name="col4", # name of the column
        entity_name="table_2", # name of the entity that contains the column
      )
    ] # Only populated for column-level lineage
  )
]
```

For more details on how to interpret the results, see [Interpreting Lineage Results](#interpreting-lineage-results).

### Filter Lineage Results

You can filter by platform, type, domain, environment, and more.

```python
{{ inline /metadata-ingestion/examples/library/get_lineage_with_filter.py show_path_as_comment }}
```

You can check more details about the available filters in the [Search SDK documentation](./sdk/search_client.md#filter-based-search).

## Lineage SDK Reference

### Supported Lineage Combinations

The Lineage APIs support the following entity combinations:

| Upstream Entity | Downstream Entity |
| --------------- | ----------------- |
| Dataset         | Dataset           |
| Dataset         | DataJob           |
| DataJob         | DataJob           |
| Dataset         | Dashboard         |
| Chart           | Dashboard         |
| Dashboard       | Dashboard         |
| Dataset         | Chart             |

> ℹ️ Column-level lineage and creating query node with transformation text are **only supported** for `Dataset → Dataset` lineage.

### Column Lineage Options

For dataset-to-dataset lineage, you can specify `column_lineage` parameter in `add_lineage()` in several ways:

| Value           | Description                                                                       |
| --------------- | --------------------------------------------------------------------------------- |
| `False`         | Disable column-level lineage (default)                                            |
| `True`          | Enable column-level lineage with automatic mapping (same as "auto_fuzzy")         |
| `"auto_fuzzy"`  | Enable column-level lineage with fuzzy matching (useful for similar column names) |
| `"auto_strict"` | Enable column-level lineage with strict matching (exact column names required)    |
| Column Mapping  | A dictionary mapping downstream column names to lists of upstream column names    |

:::note Auto_Fuzzy vs Auto_Strict

- **Auto_Fuzzy**: Automatically matches columns based on similar names, allowing for some flexibility in naming conventions. For example, these two columns would be considered a match:
  - user_id → userId
  - customer_id → CustomerId
- **Auto_Strict**: Requires exact column name matches between upstream and downstream datasets. For example, `customer_id` in the upstream dataset must match `customer_id` in the downstream dataset exactly.

:::

### Interpreting Column Lineage Results

When retrieving column-level lineage, the results include `paths` that show how columns are related across datasets. Each path is a list of column URNs that represent the lineage from the source column to the target column.

For example, let's say we have the following lineage across three tables:

<p align="center">
  <img width="80%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/column-lineage.png"/>
</p>

#### Example with `max_hops=1`

```python
>>> client.lineage.get_lineage(
        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
        source_column="col1",
        direction="downstream",
        max_hops=1
    )
```

**Returns:**

```python
[
    {
        "urn": "...table_2...",
        "hops": 1,
        "paths": [
            ["...table_1.col1", "...table_2.col4"],
            ["...table_1.col1", "...table_2.col5"]
        ]
    }
]
```

#### Example with `max_hops=2`

```python
>>> client.lineage.get_lineage(
        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
        source_column="col1",
        direction="downstream",
        max_hops=2
    )
```

**Returns:**

```python
[
    {
        "urn": "...table_2...",
        "hops": 1,
        "paths": [
            ["...table_1.col1", "...table_2.col4"],
            ["...table_1.col1", "...table_2.col5"]
        ]
    },
    {
        "urn": "...table_3...",
        "hops": 2,
        "paths": [
            ["...table_1.col1", "...table_2.col4", "...table_3.col7"]
        ]
    }
]
```

### Lineage GraphQL Examples

You can also use the GraphQL API to add and retrieve lineage.

#### Add Lineage Between Datasets with GraphQL

```graphql
mutation updateLineage {
  updateLineage(
    input: {
      edgesToAdd: [
        {
          downstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
          upstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"
        }
      ]
      edgesToRemove: []
    }
  )
}
```

#### Get Downstream Lineage with GraphQL

```graphql
query scrollAcrossLineage {
  scrollAcrossLineage(
    input: {
      query: "*"
      urn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
      count: 10
      direction: DOWNSTREAM
      orFilters: [
        {
          and: [
            {
              condition: EQUAL
              negated: false
              field: "degree"
              values: ["1", "2", "3+"]
            }
          ]
        }
      ]
    }
  ) {
    searchResults {
      degree
      entity {
        urn
        type
      }
    }
  }
}
```

## FAQ

**Can I get lineage at the column level?**
Yes — for dataset-to-dataset lineage, both `add_lineage()` and `get_lineage()` support column-level lineage.

**Can I pass a SQL query and get lineage automatically?**
Yes — use `infer_lineage_from_sql()` to parse a query and extract table and column lineage.

**Can I use filters when retrieving lineage?**
Yes — `get_lineage()` accepts structured filters via `FilterDsl`, just like in the Search SDK.
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								# Lineage
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								DataHub’s Python SDK allows you to programmatically define and retrieve lineage between metadata entities. With the DataHub Lineage SDK, you can:
-												feat(docs): consolidate api guides (#7857)

Co-authored-by: socar-dini <dini@socar.kr>
											
										
										
											2023-04-20 12:17:11 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								- Add **table-level and column-level lineage** across datasets, data jobs, dashboards, and charts
 								- Automatically **infer lineage from SQL queries**
 								- **Read lineage** (upstream or downstream) for a given entity or column
 								- **Filter lineage results** using structured filters
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								## Getting Started
-												docs(lineage): Lineage docs refactoring (#8899)


											
										
										
											2023-10-04 17:43:59 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								To use DataHub SDK, you'll need to install [`acryl-datahub`](https://pypi.org/project/acryl-datahub/) and set up a connection to your DataHub instance. Follow the [installation guide](https://docs.datahub.com/docs/metadata-ingestion/cli-ingestion#installing-datahub-cli) to get started.
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								Connect to your DataHub instance:
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								```python
 								from datahub.sdk import DataHubClient
 								client = DataHubClient(server="<your_server>", token="<your_token>")
 								```
 								- **server**: The URL of your DataHub GMS server
 								  - local: `http://localhost:8080`
 								  - hosted: `https://<your_datahub_url>/gms`
 								- **token**: You'll need to [generate a Personal Access Token](https://docs.datahub.com/docs/authentication/personal-access-tokens) from your DataHub instance.
 								## Add Lineage
 								The `add_lineage()` method allows you to define lineage between two entities.
 								### Add Entity Lineage
 								You can create lineage between two datasets, data jobs, dashboards, or charts. The `upstream` and `downstream` parameters should be the URNs of the entities you want to link.
 								#### Add Entity Lineage Between Datasets
 								```python
 								{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset.py show_path_as_comment }}
 								```
 								#### Add Entity Lineage Between Datajobs
 								```python
 								{{ inline /metadata-ingestion/examples/library/lineage_datajob_to_datajob.py show_path_as_comment }}
 								```
 								:::note Lineage Combinations
 								For supported lineage combinations, see [Supported Lineage Combinations](#supported-lineage-combinations).
 								:::
 								### Add Column Lineage
 								You can add column-level lineage by using `column_lineage` parameter when linking datasets.
-												feat: add missing python sdk guides based on DatahubGraph (#7875)

Co-authored-by: socar-dini <dini@socar.kr>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2023-05-03 07:32:23 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Add Column Lineage with Fuzzy Matching
 								```python
 								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column.py show_path_as_comment }}
 								```
-												feat: add docs on creating tags/terms/datasets (#7608)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-03-17 06:12:35 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								When `column_lineage` is set to **True**, DataHub will automatically map columns based on their names, allowing for fuzzy matching. This is useful when upstream and downstream datasets have similar but not identical column names. (e.g. `customer_id` in upstream and `CustomerId` in downstream).
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Add Column Lineage with Strict Matching
 								```python
 								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_auto_strict.py show_path_as_comment }}
 								```
 								This will create column-level lineage with strict matching, meaning the column names must match exactly between upstream and downstream datasets.
 								#### Add Column Lineage with Custom Mapping
 								For custom mapping, you can use a dictionary where keys are downstream column names and values represent lists of upstream column names. This allows you to specify complex relationships.
 								```python
 								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_custom_mapping.py show_path_as_comment }}
 								```
 								### Infer Lineage from SQL
 								You can infer lineage directly from a SQL query using `infer_lineage_from_sql()`. This will parse the query, determine upstream and downstream datasets, and automatically add lineage (including column-level lineage when possible).
 								```python
 								{{ inline /metadata-ingestion/examples/library/lineage_dataset_from_sql.py show_path_as_comment }}
 								```
 								:::note DataHub SQL Parser
 								Check out more information on how we handle SQL parsing below.
 								- [The DataHub SQL Parser Documentation](../../lineage/sql_parsing.md)
 								- [Blog Post : Extracting Column-Level Lineage from SQL](https://medium.com/datahub-project/extracting-column-level-lineage-from-sql-779b8ce17567)
 								:::
 								### Add Query Node with Lineage
 								If you provide a `transformation_text` to `add_lineage`, DataHub will create a query node that represents the transformation logic. This is useful for tracking how data is transformed between datasets.
 								```python
 								{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset_with_query_node.py show_path_as_comment }}
 								```
 								Transformation text can be any transformation logic, Python scripts, Airflow DAG code, or any other code that describes how the upstream dataset is transformed into the downstream dataset.
 								<p align="center">
 								  <img width="80%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/query-node.png"/>
 								</p>
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
 								:::note
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								Providing `transformation_text` will NOT create column lineage. You need to specify `column_lineage` parameter to enable column-level lineage.
 								If you have a SQL query that describes the transformation, you can use [infer_lineage_from_sql](#infer-lineage-from-sql) to automatically parse the query and add column level lineage.
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								:::
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								## Get Lineage
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								The `get_lineage()` method allows you to retrieve lineage for a given entity.
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								### Get Entity Lineage
 								#### Get Upstream Lineage for a Dataset
 								This will return the direct upstream entity that the dataset depends on. By default, it retrieves only the immediate upstream entities (1 hop).
 								```python
 								{{ inline /metadata-ingestion/examples/library/get_lineage_basic.py show_path_as_comment }}
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Get Downstream Lineage for a Dataset Across Multiple Hops
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								To get upstream/downstream entities that are more than one hop away, you can use the `max_hops` parameter. This allows you to traverse the lineage graph up to a specified number of hops.
 								```python
 								{{ inline /metadata-ingestion/examples/library/get_lineage_with_hops.py show_path_as_comment }}
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
 								```
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								:::note USING MAX_HOPS
 								if you provide `max_hops` greater than 2, it will traverse the full lineage graph and limit the results by `count`.
 								:::
 								#### Return Type
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								`get_lineage()` returns a list of `LineageResult` objects.
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```python
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								results = [
 								  LineageResult(
 								    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
 								    type="DATASET",
 								    hops=1,
 								    direction="downstream",
 								    platform="snowflake",
 								    name="table_2", # name of the entity
 								    paths=[] # Only populated for column-level lineage
 								  )
 								]
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```
-												feat: add missing python sdk guides based on DatahubGraph (#7875)

Co-authored-by: socar-dini <dini@socar.kr>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2023-05-03 07:32:23 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								### Get Column-Level Lineage
 								#### Get Downstream Lineage for a Dataset Column
 								You can retrieve column-level lineage by specifying the `source_column` parameter. This will return lineage paths that include the specified column.
 								```python
 								{{ inline /metadata-ingestion/examples/library/get_column_lineage.py show_path_as_comment }}
 								```
 								You can also pass `SchemaFieldUrn` as the `source_urn` to get column-level lineage.
 								```python
 								{{ inline /metadata-ingestion/examples/library/get_column_lineage_from_schemafield.py show_path_as_comment }}
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
 								```
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Return type
 								The return type is the same as for entity lineage, but with additional `paths` field that contains column lineage paths.
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								```python
 								results = [
 								  LineageResult(
 								    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
 								    type="DATASET",
 								    hops=1,
 								    direction="downstream",
 								    platform="snowflake",
 								    name="table_2", # name of the entity
 								    paths=[
 								      LineagePath(
 								        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD),col1)",
 								        column_name="col1", # name of the column
 								        entity_name="table_1", # name of the entity that contains the column
 								      ),
 								      LineagePath(
 								        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD),col4)",
 								        column_name="col4", # name of the column
 								        entity_name="table_2", # name of the entity that contains the column
 								      )
 								    ] # Only populated for column-level lineage
 								  )
 								]
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```
-												feat: add missing python sdk guides based on DatahubGraph (#7875)

Co-authored-by: socar-dini <dini@socar.kr>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2023-05-03 07:32:23 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								For more details on how to interpret the results, see [Interpreting Lineage Results](#interpreting-lineage-results).
 								### Filter Lineage Results
 								You can filter by platform, type, domain, environment, and more.
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```python
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								{{ inline /metadata-ingestion/examples/library/get_lineage_with_filter.py show_path_as_comment }}
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
+								```
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								You can check more details about the available filters in the [Search SDK documentation](./sdk/search_client.md#filter-based-search).
 								## Lineage SDK Reference
 								### Supported Lineage Combinations
 								The Lineage APIs support the following entity combinations:
 								| Upstream Entity | Downstream Entity |
 								| --------------- | ----------------- |
 								| Dataset         | Dataset           |
 								| Dataset         | DataJob           |
 								| DataJob         | DataJob           |
 								| Dataset         | Dashboard         |
 								| Chart           | Dashboard         |
 								| Dashboard       | Dashboard         |
 								| Dataset         | Chart             |
 								> ℹ️ Column-level lineage and creating query node with transformation text are **only supported** for `Dataset → Dataset` lineage.
-												feat(docs): consolidate api guides (#7857)

Co-authored-by: socar-dini <dini@socar.kr>
											
										
										
											2023-04-20 12:17:11 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								### Column Lineage Options
 								For dataset-to-dataset lineage, you can specify `column_lineage` parameter in `add_lineage()` in several ways:
 								| Value           | Description                                                                       |
 								| --------------- | --------------------------------------------------------------------------------- |
 								| `False`         | Disable column-level lineage (default)                                            |
 								| `True`          | Enable column-level lineage with automatic mapping (same as "auto_fuzzy")         |
 								| `"auto_fuzzy"`  | Enable column-level lineage with fuzzy matching (useful for similar column names) |
 								| `"auto_strict"` | Enable column-level lineage with strict matching (exact column names required)    |
 								| Column Mapping  | A dictionary mapping downstream column names to lists of upstream column names    |
 								:::note Auto_Fuzzy vs Auto_Strict
 								- **Auto_Fuzzy**: Automatically matches columns based on similar names, allowing for some flexibility in naming conventions. For example, these two columns would be considered a match:
 								  - user_id → userId
 								  - customer_id → CustomerId
 								- **Auto_Strict**: Requires exact column name matches between upstream and downstream datasets. For example, `customer_id` in the upstream dataset must match `customer_id` in the downstream dataset exactly.
 								:::
-												feat(docs): refactor guide on graphql (#7745)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
											
										
										
											2023-04-08 08:26:58 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								### Interpreting Column Lineage Results
 								When retrieving column-level lineage, the results include `paths` that show how columns are related across datasets. Each path is a list of column URNs that represent the lineage from the source column to the target column.
 								For example, let's say we have the following lineage across three tables:
-												feat(docs): add docs on lineage (#7576)

Co-authored-by: Hyejin Yoon <yoonhyejin@Hyejins-MacBook-Pro.local>
											
										
										
											2023-03-16 08:19:31 +09:00
-												docs(docs): add native versioning (#8714)


											
										
										
											2023-08-26 06:10:13 +09:00
+								<p align="center">
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								  <img width="80%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/column-lineage.png"/>
-												docs(docs): add native versioning (#8714)


											
										
										
											2023-08-26 06:10:13 +09:00
+								</p>
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Example with `max_hops=1`
 								```python
 								>>> client.lineage.get_lineage(
 								        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
 								        source_column="col1",
 								        direction="downstream",
 								        max_hops=1
 								    )
 								```
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								**Returns:**
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
 								```python
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								[
 								    {
 								        "urn": "...table_2...",
 								        "hops": 1,
 								        "paths": [
 								            ["...table_1.col1", "...table_2.col4"],
 								            ["...table_1.col1", "...table_2.col5"]
 								        ]
 								    }
 								]
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
+								```
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Example with `max_hops=2`
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								```python
 								>>> client.lineage.get_lineage(
 								        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
 								        source_column="col1",
 								        direction="downstream",
 								        max_hops=2
 								    )
 								```
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								**Returns:**
-												feat: add docs on  column-level linage (#8062)


											
										
										
											2023-05-19 07:59:30 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								```python
 								[
 								    {
 								        "urn": "...table_2...",
 								        "hops": 1,
 								        "paths": [
 								            ["...table_1.col1", "...table_2.col4"],
 								            ["...table_1.col1", "...table_2.col5"]
 								        ]
 								    },
 								    {
 								        "urn": "...table_3...",
 								        "hops": 2,
 								        "paths": [
 								            ["...table_1.col1", "...table_2.col4", "...table_3.col7"]
 								        ]
 								    }
 								]
 								```
-												docs(docs): add native versioning (#8714)


											
										
										
											2023-08-26 06:10:13 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								### Lineage GraphQL Examples
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								You can also use the GraphQL API to add and retrieve lineage.
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Add Lineage Between Datasets with GraphQL
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								```graphql
 								mutation updateLineage {
 								  updateLineage(
 								    input: {
 								      edgesToAdd: [
 								        {
 								          downstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
 								          upstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"
 								        }
 								      ]
 								      edgesToRemove: []
 								    }
 								  )
 								}
 								```
-												docs(lineage): add read lineage example (#8322)


											
										
										
											2023-06-30 08:48:05 -07:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								#### Get Downstream Lineage with GraphQL
-												docs(lineage): add read lineage example (#8322)


											
										
										
											2023-06-30 08:48:05 -07:00
-												fix:small typo on graphql tutorial (#8741)


											
										
										
											2023-09-01 18:14:28 +09:00
+								```graphql
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
+								query scrollAcrossLineage {
 								  scrollAcrossLineage(
-												docs(lineage): add read lineage example (#8322)


											
										
										
											2023-06-30 08:48:05 -07:00
+								    input: {
 								      query: "*"
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
+								      urn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
-												docs(lineage): add read lineage example (#8322)


											
										
										
											2023-06-30 08:48:05 -07:00
+								      count: 10
 								      direction: DOWNSTREAM
 								      orFilters: [
 								        {
 								          and: [
 								            {
 								              condition: EQUAL
 								              negated: false
 								              field: "degree"
 								              values: ["1", "2", "3+"]
 								            }
 								          ]
 								        }
 								      ]
 								    }
 								  ) {
 								    searchResults {
 								      degree
 								      entity {
 								        urn
 								        type
 								      }
 								    }
 								  }
 								}
 								```
-												ci(graphql,workflows): Format .md, .graphql, and workflow .yml files via prettier (#13220)


											
										
										
											2025-04-16 16:55:51 -07:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								## FAQ
-												docs: improve lineage docs (#10396)


											
										
										
											2024-05-22 15:25:49 +09:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								**Can I get lineage at the column level?**
 								Yes — for dataset-to-dataset lineage, both `add_lineage()` and `get_lineage()` support column-level lineage.
-												docs(impact analysis): Add column level impact analysis graphql example (#10427)


											
										
										
											2024-05-09 13:57:44 -07:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								**Can I pass a SQL query and get lineage automatically?**
 								Yes — use `infer_lineage_from_sql()` to parse a query and extract table and column lineage.
-												docs(impact analysis): Add column level impact analysis graphql example (#10427)


											
										
										
											2024-05-09 13:57:44 -07:00
-												docs: linage client SDK guide (#13700)


											
										
										
											2025-06-13 00:57:16 +09:00
+								**Can I use filters when retrieving lineage?**
 								Yes — `get_lineage()` accepts structured filters via `FilterDsl`, just like in the Search SDK.