Pere Miquel Brull 613fd331e0
MINOR - Clean up configs & add auto classification docs (#18907)
* MINOR - Clean up configs & add auto classification docs

* deprecation notice
2024-12-04 09:32:25 +01:00

3.9 KiB

Lineage

After running a Metadata Ingestion workflow, we can run Lineage workflow. While the serviceName will be the same to that was used in Metadata Ingestion, so the ingestion bot can get the serviceConnection details from the server.

1. Define the YAML Config

This is a sample config for BigQuery Lineage:

{% codePreview %}

{% codeInfoContainer %}

{% codeInfo srNumber=40 %}

Source Configuration - Source Config

You can find all the definitions and types for the sourceConfig here.

{% /codeInfo %}

{% codeInfo srNumber=41 %}

queryLogDuration: Configuration to tune how far we want to look back in query logs to process lineage data in days.

{% /codeInfo %}

{% codeInfo srNumber=42 %}

parsingTimeoutLimit: Configuration to set the timeout for parsing the query in seconds. {% /codeInfo %}

{% codeInfo srNumber=43 %}

filterCondition: Condition to filter the query history.

{% /codeInfo %}

{% codeInfo srNumber=44 %}

resultLimit: Configuration to set the limit for query logs.

{% /codeInfo %}

{% codeInfo srNumber=45 %}

queryLogFilePath: Configuration to set the file path for query logs.

{% /codeInfo %}

{% codeInfo srNumber=46 %}

databaseFilterPattern: Regex to only fetch databases that matches the pattern.

{% /codeInfo %}

{% codeInfo srNumber=47 %}

schemaFilterPattern: Regex to only fetch tables or databases that matches the pattern.

{% /codeInfo %}

{% codeInfo srNumber=48 %}

tableFilterPattern: Regex to only fetch tables or databases that matches the pattern.

{% /codeInfo %}

{% codeInfo srNumber=49 %}

Sink Configuration

To send the metadata to OpenMetadata, it needs to be specified as type: metadata-rest. {% /codeInfo %}

{% codeInfo srNumber=50 %}

Workflow Configuration

The main property here is the openMetadataServerConfig, where you can define the host and security provider of your OpenMetadata installation.

For a simple, local installation using our docker containers, this looks like:

{% /codeInfo %}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}

source:
  type: {% $connector %}-lineage
  serviceName: {% $connector %}
  sourceConfig:
    config:
      type: DatabaseLineage
      # Number of days to look back
      queryLogDuration: 1
      parsingTimeoutLimit: 300
      # filterCondition: query_text not ilike '--- metabase query %'
      resultLimit: 1000
      # If instead of getting the query logs from the database we want to pass a file with the queries
      # queryLogFilePath: /tmp/query_log/file_path
      # databaseFilterPattern:
      #   includes:
      #     - database1
      #     - database2
      #   excludes:
      #     - database3
      #     - database4
      # schemaFilterPattern:
      #   includes:
      #     - schema1
      #     - schema2
      #   excludes:
      #     - schema3
      #     - schema4
      # tableFilterPattern:
      #   includes:
      #     - table1
      #     - table2
      #   excludes:
      #     - table3
      #     - table4
sink:
  type: metadata-rest
  config: {}

{% partial file="/v1.5/connectors/yaml/workflow-config.md" /%}

{% /codeBlock %}

{% /codePreview %}

  • You can learn more about how to configure and run the Lineage Workflow to extract Lineage data from here

2. Run with the CLI

After saving the YAML config, we will run the command the same way we did for the metadata ingestion:

metadata ingest -c <path-to-yaml>