3.4 KiB

Kafka Connect

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup

To install this plugin, run pip install 'acryl-datahub[kafka-connect]'.

Capabilities

This plugin extracts the following:

  • Kafka Connect connector as individual DataFlowSnapshotClass entity
  • Creating individual DataJobSnapshotClass entity using {connector_name}:{source_dataset} naming
  • Lineage information between source database to Kafka topic

Current limitations:

  • works only for
    • JDBC and Debezium source connectors
    • BigQuery sink connector
Capability Status Details
Platform Instance ✔️ link

Quickstart recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
  type: "kafka-connect"
  config:
    # Coordinates
    connect_uri: "http://localhost:8083"
    cluster_name: "connect-cluster"
    provided_configs:     
      - provider: env
        path_key: MYSQL_CONNECTION_URL
        value: jdbc:mysql://test_mysql:3306/librarydb
    # Optional mapping of platform types to instance ids
    platform_instance_map: # optional
      mysql: test_mysql    # optional

    # Credentials
    username: admin
    password: password

sink:
  # sink configs

Config details

Note that a . is used to denote nested fields in the YAML recipe.

Field Required Default Description
connect_uri "http://localhost:8083/" URI to connect to.
username Kafka Connect username.
password Kafka Connect password.
cluster_name "connect-cluster" Cluster to ingest from.
provided_configs Provided Configurations
construct_lineage_workunits True Whether to create the input and output Dataset entities
connector_patterns.deny List of regex patterns for connectors to include in ingestion.
connector_patterns.allow List of regex patterns for connectors to exclude from ingestion.
connector_pattern.ignoreCase True Whether to ignore case sensitivity during pattern matching.
env "PROD" Environment to use in namespace when constructing URNs.
platform_instance_map Platform instance mapping to use when constructing URNs. e.g.platform_instance_map: { "hive": "warehouse" }

Compatibility

Coming soon!

Questions

If you've got any questions on configuring this source, feel free to ping us on our Slack!