mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 10:49:00 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			76 lines
		
	
	
		
			3.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			76 lines
		
	
	
		
			3.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # File Based Lineage
 | |
| 
 | |
| For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
 | |
| 
 | |
| ## Setup
 | |
| 
 | |
| Works with `acryl-datahub` out of the box.
 | |
| 
 | |
| ## Capabilities
 | |
| 
 | |
| This plugin pulls lineage metadata from a yaml-formatted file. An example of one such file is located in the examples
 | |
| directory [here](../examples/bootstrap_data/file_lineage.yml).
 | |
| 
 | |
| ## Quickstart recipe
 | |
| 
 | |
| Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration
 | |
| options.
 | |
| 
 | |
| For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
 | |
| 
 | |
| ```yml
 | |
| source:
 | |
|   type: datahub-lineage-file
 | |
|   config:
 | |
|     # Coordinates
 | |
|     file: /path/to/file_lineage.yml
 | |
|     # Whether we want to query datahub-gms for upstream data
 | |
|     preserve_upstream: False
 | |
| 
 | |
| sink:
 | |
| # sink configs
 | |
| ```
 | |
| 
 | |
| ## Config details
 | |
| 
 | |
| Note that a `.` is used to denote nested fields in the YAML recipe.
 | |
| 
 | |
| | Field               | Required | Default | Description                                                                                                                                                                                                                    |
 | |
| |---------------------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | |
| | `file`              | ✅        |         | Path to lineage file to ingest.                                                                                                                                                                                                    |
 | |
| | `preserve_upstream` |          | `True`  | Whether we want to query datahub-gms for upstream data. `False` means it will hard replace upstream data for a given entity. `True` means it will query the backend for existing upstreams and include it in the ingestion run |
 | |
| 
 | |
| ### Lineage File Format
 | |
| 
 | |
| The lineage source file should be a `.yml` file with the following top-level keys:
 | |
| 
 | |
| **version**: the version of lineage file config the config conforms to. Currently, the only version released
 | |
| is `1`.
 | |
| 
 | |
| **lineage**: the top level key of the lineage file containing a list of **EntityNodeConfig** objects
 | |
| 
 | |
| **EntityNodeConfig**:
 | |
| 
 | |
| - **entity**: **EntityConfig** object
 | |
| - **upstream**: (optional) list of child **EntityNodeConfig** objects
 | |
| 
 | |
| **EntityConfig**:
 | |
| 
 | |
| - **name** : name of the entity
 | |
| - **type**: type of the entity (only `dataset` is supported as of now)
 | |
| - **env**: the environment of this entity. Should match the values in the
 | |
|   table [here](https://datahubproject.io/docs/graphql/enums/#fabrictype)
 | |
| - **platform**: a valid platform like kafka, snowflake, etc..
 | |
| - **platform_instance**: optional string specifying the platform instance of this entity
 | |
| 
 | |
| You can also view an example lineage file checked in [here](../examples/bootstrap_data/file_lineage.yml)
 | |
| 
 | |
| ## Compatibility
 | |
| 
 | |
| Compatible with version 1 of lineage format. The source will be evolved as we publish newer versions of this
 | |
| format.
 | |
| 
 | |
| ## Questions
 | |
| 
 | |
| If you've got any questions on configuring this source, feel free to ping us
 | |
| on [our Slack](https://slack.datahubproject.io/)! | 
