2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								# Lineage
  
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
										 
							
							
								DataHub’ 
							 
						 
					
						
							
								
									
										
										
										
											2023-04-20 12:17:11 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								-  Add **table-level and column-level lineage**  across datasets, data jobs, dashboards, and charts 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  Automatically **infer lineage from SQL queries**  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  **Read lineage** (upstream or downstream) for a given entity or column 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  **Filter lineage results** using structured filters 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								## Getting Started
  
						 
					
						
							
								
									
										
										
										
											2023-10-04 17:43:59 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								To use DataHub SDK, you'll need to install [`acryl-datahub` ](https://pypi.org/project/acryl-datahub/ ) and set up a connection to your DataHub instance. Follow the [installation guide ](https://docs.datahub.com/docs/metadata-ingestion/cli-ingestion#installing-datahub-cli ) to get started.
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								Connect to your DataHub instance:
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								from datahub.sdk import DataHubClient
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								client = DataHubClient(server="< your_server > ", token="< your_token > ")
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  **server**: The URL of your DataHub GMS server 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  -  local: `http://localhost:8080` 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  -  hosted: `https://<your_datahub_url>/gms` 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  **token**: You'll need to [generate a Personal Access Token ](https://docs.datahub.com/docs/authentication/personal-access-tokens ) from your DataHub instance. 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								## Add Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								The `add_lineage()`  method allows you to define lineage between two entities.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								### Add Entity Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								You can create lineage between two datasets, data jobs, dashboards, or charts. The `upstream`  and `downstream`  parameters should be the URNs of the entities you want to link.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Entity Lineage Between Datasets
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Entity Lineage Between Datajobs
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/lineage_datajob_to_datajob.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::note Lineage Combinations
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								For supported lineage combinations, see [Supported Lineage Combinations ](#supported-lineage-combinations ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								### Add Column Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								You can add column-level lineage by using `column_lineage`  parameter when linking datasets.
							 
						 
					
						
							
								
									
										
										
										
											2023-05-03 07:32:23 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Column Lineage with Fuzzy Matching
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-03-17 06:12:35 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								When `column_lineage`  is set to **True** , DataHub will automatically map columns based on their names, allowing for fuzzy matching. This is useful when upstream and downstream datasets have similar but not identical column names. (e.g. `customer_id`  in upstream and `CustomerId`  in downstream). See [Column Lineage Options ](#column-lineage-options ) for more details.
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Column Lineage with Strict Matching
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_auto_strict.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								This will create column-level lineage with strict matching, meaning the column names must match exactly between upstream and downstream datasets.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Column Lineage with Custom Mapping
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								For custom mapping, you can use a dictionary where keys are downstream column names and values represent lists of upstream column names. This allows you to specify complex relationships.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/lineage_dataset_column_custom_mapping.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								### Infer Lineage from SQL
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								You can infer lineage directly from a SQL query using `infer_lineage_from_sql()` . This will parse the query, determine upstream and downstream datasets, and automatically add lineage (including column-level lineage when possible) and a query node showing the SQL transformation logic.
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/lineage_dataset_from_sql.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::note DataHub SQL Parser
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								Check out more information on how we handle SQL parsing below.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								-  [The DataHub SQL Parser Documentation ](../../lineage/sql_parsing.md ) 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								-  [Blog Post: Extracting Column-Level Lineage from SQL ](https://medium.com/datahub-project/extracting-column-level-lineage-from-sql-779b8ce17567 ) 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								### Add Query Node with Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								If you provide a `transformation_text`  to `add_lineage` , DataHub will create a query node that represents the transformation logic. This is useful for tracking how data is transformed between datasets.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/add_lineage_dataset_to_dataset_with_query_node.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								Transformation text can be any transformation logic, Python scripts, Airflow DAG code, or any other code that describes how the upstream dataset is transformed into the downstream dataset.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/query-node.png" / > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								< / p >  
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::note
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								Providing `transformation_text`  will NOT create column lineage. You need to specify `column_lineage`  parameter to enable column-level lineage.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								If you have a SQL query that describes the transformation, you can use [infer_lineage_from_sql ](#infer-lineage-from-sql ) to automatically parse the query and add column level lineage.
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								## Get Lineage
  
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								The `get_lineage()`  method allows you to retrieve lineage for a given entity.
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								### Get Entity Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Get Upstream Lineage for a Dataset
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								This will return the direct upstream entity that the dataset depends on. By default, it retrieves only the immediate upstream entities (1 hop).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/get_lineage_basic.py show_path_as_comment }}
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Get Downstream Lineage for a Dataset Across Multiple Hops
  
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								To get upstream/downstream entities that are more than one hop away, you can use the `max_hops`  parameter. This allows you to traverse the lineage graph up to a specified number of hops.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/get_lineage_with_hops.py show_path_as_comment }}
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								:::note USING MAX_HOPS
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								if you provide `max_hops`  greater than 2, it will traverse the full lineage graph and limit the results by `count` .
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Return Type
  
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								`get_lineage()`  returns a list of `LineageResult`  objects. 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								results = [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  LineageResult(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    type="DATASET",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    hops=1,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    direction="downstream",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    platform="snowflake",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    name="table_2", # name of the entity
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    paths=[] # Only populated for column-level lineage
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								]
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-05-03 07:32:23 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								### Get Column-Level Lineage
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								#### Get Downstream Lineage for a Dataset Column
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								You can retrieve column-level lineage by specifying the `source_column`  parameter. This will return lineage paths that include the specified column.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/get_column_lineage.py show_path_as_comment }}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								You can also pass `SchemaFieldUrn`  as the `source_urn`  to get column-level lineage.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/get_column_lineage_from_schemafield.py show_path_as_comment }}
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Return type
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								The return type is the same as for entity lineage, but with additional `paths`  field that contains column lineage paths.
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								results = [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  LineageResult(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    type="DATASET",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    hops=1,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    direction="downstream",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    platform="snowflake",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    name="table_2", # name of the entity
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    paths=[
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      LineagePath(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD),col1)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        column_name="col1", # name of the column
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        entity_name="table_1", # name of the entity that contains the column
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      ),
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      LineagePath(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        urn="urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:snowflake,table_2,PROD),col4)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        column_name="col4", # name of the column
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        entity_name="table_2", # name of the entity that contains the column
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    ] # Only populated for column-level lineage
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								]
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-05-03 07:32:23 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								For more details on how to interpret the results, see [Interpreting Lineage Results ](#interpreting-lineage-results ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								### Filter Lineage Results
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								You can filter by platform, type, domain, environment, and more.
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								{{ inline /metadata-ingestion/examples/library/get_lineage_with_filter.py show_path_as_comment }}
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								You can check more details about the available filters in the [Search SDK documentation ](./sdk/search_client.md#filter-based-search ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								## Lineage SDK Reference
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								For a full reference, see the [lineage SDK reference ](../../../python-sdk/sdk-v2/lineage-client.mdx ).
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								### Supported Lineage Combinations
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								The Lineage APIs support the following entity combinations:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Upstream Entity | Downstream Entity |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| --------------- | ----------------- |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Dataset         | Dataset           |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Dataset         | DataJob           |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| DataJob         | DataJob           |
							 
						 
					
						
							
								
									
										
										
										
											2025-07-28 17:45:33 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								| DataJob         | Dataset           |
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								| Dataset         | Dashboard         |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Chart           | Dashboard         |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Dashboard       | Dashboard         |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Dataset         | Chart             |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
										 
							
							
								>  ℹ ️  
						 
					
						
							
								
									
										
										
										
											2023-04-20 12:17:11 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								### Column Lineage Options
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								For dataset-to-dataset lineage, you can specify `column_lineage`  parameter in `add_lineage()`  in several ways:
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Value           | Description                                                                       |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| --------------- | --------------------------------------------------------------------------------- |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| `False`          | Disable column-level lineage (default)                                            |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| `True`           | Enable column-level lineage with automatic mapping (same as "auto_fuzzy")         |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| `"auto_fuzzy"`   | Enable column-level lineage with fuzzy matching (useful for similar column names) |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| `"auto_strict"`  | Enable column-level lineage with strict matching (exact column names required)    |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								| Column Mapping  | A dictionary mapping downstream column names to lists of upstream column names    |
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								:::note `auto_fuzzy`  vs `auto_strict` 
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								-  **`auto_fuzzy` **: Automatically matches columns based on similar names, allowing for some flexibility in naming conventions. For example, these two columns would be considered a match: 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								  -  user_id → userId
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  -  customer_id → CustomerId
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								-  **`auto_strict` **: Requires exact column name matches between upstream and downstream datasets. For example, `customer_id`  in the upstream dataset must match `customer_id`  in the downstream dataset exactly. 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								:::
							 
						 
					
						
							
								
									
										
										
										
											2023-04-08 08:26:58 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								### Interpreting Column Lineage Results
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								When retrieving column-level lineage, the results include `paths`  that show how columns are related across datasets. Each path is a list of column URNs that represent the lineage from the source column to the target column.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								For example, let's say we have the following lineage across three tables:
							 
						 
					
						
							
								
									
										
										
										
											2023-03-16 08:19:31 +09:00 
										
									 
								 
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								< p  align = "center" >  
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								  < img  width = "80%"   src = "https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/column-lineage.png" / > 
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								< / p >  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Example with `max_hops=1`
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								>>> client.lineage.get_lineage(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        source_column="col1",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        direction="downstream",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        max_hops=1
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								**Returns:**
							 
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								[
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "urn": "...table_2...",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "hops": 1,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "paths": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            ["...table_1.col1", "...table_2.col4"],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            ["...table_1.col1", "...table_2.col5"]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								]
							 
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Example with `max_hops=2`
  
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								>>> client.lineage.get_lineage(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        source_urn="urn:li:dataset:(urn:li:dataPlatform:snowflake,table_1,PROD)",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        source_column="col1",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        direction="downstream",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        max_hops=2
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								**Returns:**
							 
						 
					
						
							
								
									
										
										
										
											2023-05-19 07:59:30 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```python
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								[
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "urn": "...table_2...",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "hops": 1,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "paths": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            ["...table_1.col1", "...table_2.col4"],
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            ["...table_1.col1", "...table_2.col5"]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    },
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "urn": "...table_3...",
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "hops": 2,
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        "paths": [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            ["...table_1.col1", "...table_2.col4", "...table_3.col7"]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-08-26 06:10:13 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								## Alternative: Lineage GraphQL API
  
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-15 22:38:46 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								While we generally recommend using the Python SDK for lineage, you can also use the GraphQL API to add and retrieve lineage.
							 
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Add Lineage Between Datasets with GraphQL
  
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```graphql
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								mutation updateLineage {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  updateLineage(
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    input: {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      edgesToAdd: [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								          downstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								          upstreamUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      edgesToRemove: []
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  )
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2023-06-30 08:48:05 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								#### Get Downstream Lineage with GraphQL
  
						 
					
						
							
								
									
										
										
										
											2023-06-30 08:48:05 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-09-01 18:14:28 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								```graphql
							 
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								query scrollAcrossLineage {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  scrollAcrossLineage(
							 
						 
					
						
							
								
									
										
										
										
											2023-06-30 08:48:05 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								    input: {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      query: "*"
							 
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								      urn: "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)"
							 
						 
					
						
							
								
									
										
										
										
											2023-06-30 08:48:05 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								      count: 10
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      direction: DOWNSTREAM
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      orFilters: [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								          and: [
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								              condition: EQUAL
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								              negated: false
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								              field: "degree"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								              values: ["1", "2", "3+"]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								            }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								          ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      ]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  ) {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    searchResults {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      degree
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      entity {
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        urn
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								        type
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								      }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								    }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								  }
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								}
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								```
							 
						 
					
						
							
								
									
										
										
										
											2025-04-16 16:55:51 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								## FAQ
  
						 
					
						
							
								
									
										
										
										
											2024-05-22 15:25:49 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								**Can I get lineage at the column level?**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								Yes — for dataset-to-dataset lineage, both `add_lineage()`  and `get_lineage()`  support column-level lineage.
							 
						 
					
						
							
								
									
										
										
										
											2024-05-09 13:57:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								**Can I pass a SQL query and get lineage automatically?**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								Yes — use `infer_lineage_from_sql()`  to parse a query and extract table and column lineage.
							 
						 
					
						
							
								
									
										
										
										
											2024-05-09 13:57:44 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2025-06-13 00:57:16 +09:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
								
									
								 
							
							
								**Can I use filters when retrieving lineage?**
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
								
									
								 
							
							
								Yes — `get_lineage()`  accepts structured filters via `FilterDsl` , just like in the Search SDK.