3.9 KiB
Lineage
After running a Metadata Ingestion workflow, we can run Lineage workflow.
While the serviceName will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the serviceConnection details from the server.
1. Define the YAML Config
This is a sample config for BigQuery Lineage:
{% codePreview %}
{% codeInfoContainer %}
{% codeInfo srNumber=40 %}
Source Configuration - Source Config
You can find all the definitions and types for the sourceConfig here.
{% /codeInfo %}
{% codeInfo srNumber=41 %}
queryLogDuration: Configuration to tune how far we want to look back in query logs to process lineage data in days.
{% /codeInfo %}
{% codeInfo srNumber=42 %}
parsingTimeoutLimit: Configuration to set the timeout for parsing the query in seconds. {% /codeInfo %}
{% codeInfo srNumber=43 %}
filterCondition: Condition to filter the query history.
{% /codeInfo %}
{% codeInfo srNumber=44 %}
resultLimit: Configuration to set the limit for query logs.
{% /codeInfo %}
{% codeInfo srNumber=45 %}
queryLogFilePath: Configuration to set the file path for query logs.
{% /codeInfo %}
{% codeInfo srNumber=46 %}
databaseFilterPattern: Regex to only fetch databases that matches the pattern.
{% /codeInfo %}
{% codeInfo srNumber=47 %}
schemaFilterPattern: Regex to only fetch tables or databases that matches the pattern.
{% /codeInfo %}
{% codeInfo srNumber=48 %}
tableFilterPattern: Regex to only fetch tables or databases that matches the pattern.
{% /codeInfo %}
{% codeInfo srNumber=49 %}
Sink Configuration
To send the metadata to OpenMetadata, it needs to be specified as type: metadata-rest.
{% /codeInfo %}
{% codeInfo srNumber=50 %}
Workflow Configuration
The main property here is the openMetadataServerConfig, where you can define the host and security provider of your OpenMetadata installation.
For a simple, local installation using our docker containers, this looks like:
{% /codeInfo %}
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
source:
type: {% $connector %}-lineage
serviceName: {% $connector %}
sourceConfig:
config:
type: DatabaseLineage
# Number of days to look back
queryLogDuration: 1
parsingTimeoutLimit: 300
# filterCondition: query_text not ilike '--- metabase query %'
resultLimit: 1000
# If instead of getting the query logs from the database we want to pass a file with the queries
# queryLogFilePath: /tmp/query_log/file_path
# databaseFilterPattern:
# includes:
# - database1
# - database2
# excludes:
# - database3
# - database4
# schemaFilterPattern:
# includes:
# - schema1
# - schema2
# excludes:
# - schema3
# - schema4
# tableFilterPattern:
# includes:
# - table1
# - table2
# excludes:
# - table3
# - table4
sink:
type: metadata-rest
config: {}
{% partial file="/v1.5/connectors/yaml/workflow-config.md" /%}
{% /codeBlock %}
{% /codePreview %}
- You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from here
2. Run with the CLI
After saving the YAML config, we will run the command the same way we did for the metadata ingestion:
metadata ingest -c <path-to-yaml>