4.2 KiB
| title |
|---|
| Universal transformers |
Transformers
The below table shows transformer which can transform aspects of any entity having them.
| Aspect | Transformer |
|---|---|
browsePathsV2 |
- Set browsePaths |
Set browsePaths
This transformer operates on browsePathsV2 aspect. If it is not emitted by the ingestion source, it will be created
by the transformer. By default it will prepend configured path to the original path (so it will add it as a prefix).
Config Details
| Field | Required | Type | Default | Description |
|---|---|---|---|---|
path |
✅ | list[string] | List of nodes in the new path. | |
replace_existing |
boolean | false |
Whether to overwrite existing browse path, if set to false, the configured path will be prepended |
In the most basic case path contains list of static strings, for example, below config:
transformers:
- type: "set_browse_path"
config:
path:
- abc
- def
will be reflected as every entity having path prefixed by abc and def nodes (def will be contained by abc).
Variable substitution
The transformer has a mechanism of variables substitution in the path, where list of variables are build based on
existing browsePathsV2 aspect of the entity. Every node in the existing path, as long as it contains reference to
another entity (e.g. a container or a dataPlatformInstance) is stored in the list of variables to use. Since
we can have multiple references to entities of the same type (e.g. containers) in the browse path, they are stored
in a list-like object, with original order being respected. Let's consider an example, real-world situation, of a table
ingested from Snowflake source, and having platform_instance set to some value. Such table will have browsePathsV2
aspect set to contain below references:
- urn:li:dataPlatformInstance:(urn:li:dataPlatform:snowflake,my_platform_instance)
- urn:li:container:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
- urn:li:container:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
where urn:li:container:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa identifies a container reflecting a Snowflake's database and
urn:li:container:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb identifies a container reflecting a Snowflake's schema.
Such, existing, path will be mapped into variables as shown below:
dataPlatformInstance[0] = "urn:li:dataPlatformInstance:(urn:li:dataPlatform:snowflake,my_platform_instance)"
container[0] = "urn:li:container:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
container[1] = "urn:li:container:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
Those variables can be refered to, from the config, by using $ character, like below:
transformers:
- type: "set_browse_path"
config:
path:
- $dataPlatformInstance[0]
- $container[0]
- $container[1]
Additionally, 2 more rules apply to the variables resolution:
- If a variable does not exist (or if the index reached outside of list's length) - it will be ignored and not used in the path, all the other nodes will be used and path will be modified
$variable[*]will expand entire list of variables to multiple nodes in the path (think about it as a "flat map"), for example, the equivalent of above config, would be:transformers: - type: "set_browse_path" config: path: - $dataPlatformInstance[0] - $container[*]
Examples
Add (prefix) a top-level node "datahub" to paths emitted by the source:
transformers:
- type: "set_browse_path"
config:
path:
- datahub
Remove data platform instance from the path (if it was set), while retaining containers structure:
transformers:
- type: "set_browse_path"
config:
replace_existing: true
path:
- $container[*]