2022-05-02 00:18:15 -07:00
|
|
|
### Lineage File Format
|
|
|
|
|
|
|
|
The lineage source file should be a `.yml` file with the following top-level keys:
|
|
|
|
|
|
|
|
**version**: the version of lineage file config the config conforms to. Currently, the only version released
|
|
|
|
is `1`.
|
|
|
|
|
|
|
|
**lineage**: the top level key of the lineage file containing a list of **EntityNodeConfig** objects
|
|
|
|
|
|
|
|
**EntityNodeConfig**:
|
|
|
|
|
|
|
|
- **entity**: **EntityConfig** object
|
|
|
|
- **upstream**: (optional) list of child **EntityNodeConfig** objects
|
2023-06-22 23:59:54 -07:00
|
|
|
- **fineGrainedLineages**: (optional) list of **FineGrainedLineageConfig** objects
|
2022-05-02 00:18:15 -07:00
|
|
|
|
|
|
|
**EntityConfig**:
|
|
|
|
|
2023-06-22 23:59:54 -07:00
|
|
|
- **name**: identifier of the entity. Typically name or guid, as used in constructing entity urn.
|
2022-05-02 00:18:15 -07:00
|
|
|
- **type**: type of the entity (only `dataset` is supported as of now)
|
|
|
|
- **env**: the environment of this entity. Should match the values in the
|
2025-04-28 23:34:33 +09:00
|
|
|
table [here](https://docs.datahub.com/docs/graphql/enums/#fabrictype)
|
2022-05-02 00:18:15 -07:00
|
|
|
- **platform**: a valid platform like kafka, snowflake, etc..
|
|
|
|
- **platform_instance**: optional string specifying the platform instance of this entity
|
|
|
|
|
2023-02-03 01:03:34 +05:30
|
|
|
For example if dataset URN is `urn:li:dataset:(urn:li:dataPlatform:redshift,userdb.public.customer_table,DEV)` then **EntityConfig** will look like:
|
2025-04-16 16:55:51 -07:00
|
|
|
|
|
|
|
```yml
|
|
|
|
name: userdb.public.customer_table
|
|
|
|
type: dataset
|
|
|
|
env: DEV
|
|
|
|
platform: redshift
|
|
|
|
```
|
2023-02-03 01:03:34 +05:30
|
|
|
|
2023-06-22 23:59:54 -07:00
|
|
|
**FineGrainedLineageConfig**:
|
|
|
|
|
|
|
|
- **upstreamType**: type of upstream entity in a fine-grained lineage; default = "FIELD_SET"
|
|
|
|
- **upstreams**: (optional) list of upstream schema field urns
|
|
|
|
- **downstreamType**: type of downstream entity in a fine-grained lineage; default = "FIELD_SET"
|
|
|
|
- **downstreams**: (optional) list of downstream schema field urns
|
|
|
|
- **transformOperation**: (optional) transform operation applied to the upstream entities to produce the downstream field(s)
|
|
|
|
- **confidenceScore**: (optional) the confidence in this lineage between 0 (low confidence) and 1 (high confidence); default = 1.0
|
|
|
|
|
|
|
|
**FineGrainedLineageConfig** can be used to display fine grained lineage, also referred to as column-level lineage,
|
|
|
|
for custom sources.
|
|
|
|
|
2022-05-02 00:18:15 -07:00
|
|
|
You can also view an example lineage file checked in [here](../../../../metadata-ingestion/examples/bootstrap_data/file_lineage.yml)
|