3.2 KiB

File Based Lineage

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup

Works with acryl-datahub out of the box.

Capabilities

This plugin pulls lineage metadata from a yaml-formatted file. An example of one such file is located in the examples directory here.

Quickstart recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: datahub-lineage-file
  config:
    # Coordinates
    file: /path/to/file_lineage.yml
    # Whether we want to query datahub-gms for upstream data
    preserve_upstream: False

sink:
# sink configs

Config details

Note that a . is used to denote nested fields in the YAML recipe.

Field Required Default Description
file Path to lineage file to ingest.
preserve_upstream True Whether we want to query datahub-gms for upstream data. False means it will hard replace upstream data for a given entity. True means it will query the backend for existing upstreams and include it in the ingestion run

Lineage File Format

The lineage source file should be a .yml file with the following top-level keys:

version: the version of lineage file config the config conforms to. Currently, the only version released is 1.

lineage: the top level key of the lineage file containing a list of EntityNodeConfig objects

EntityNodeConfig:

  • entity: EntityConfig object
  • upstream: (optional) list of child EntityNodeConfig objects

EntityConfig:

  • name : name of the entity
  • type: type of the entity (only dataset is supported as of now)
  • env: the environment of this entity. Should match the values in the table here
  • platform: a valid platform like kafka, snowflake, etc..
  • platform_instance: optional string specifying the platform instance of this entity

You can also view an example lineage file checked in here

Compatibility

Compatible with version 1 of lineage format. The source will be evolved as we publish newer versions of this format.

Questions

If you've got any questions on configuring this source, feel free to ping us on our Slack!