Keshav Mohta 49cbcfb73b
Fixes: Reverse Metadata Docs (#20915)
* fix: reverse metadata collate specific content - collateContent

* fix: updated 1.8 docs collateContent

* refactor: added snowflake owner management in docs

* fix: use reverse-metadata file relative path

* fix: reverse metadata file path

* fix: yaml.md files - used relative path
2025-04-22 15:36:08 +02:00

8.8 KiB

title slug
Run the Clickhouse Connector Externally /connectors/database/clickhouse/yaml

{% connectorDetailsHeader name="Clickhouse" stage="PROD" platform="OpenMetadata" availableFeatures=["Metadata", "Query Usage", "Lineage", "Column-level Lineage", "Data Profiler", "Data Quality", "dbt", "Sample Data", "Reverse Metadata (Collate Only)"] unavailableFeatures=["Owners", "Tags", "Stored Procedures"] / %}

In this section, we provide guides and references to use the Clickhouse connector.

Configure and schedule Clickhouse metadata and profiler workflows from the OpenMetadata UI:

{% partial file="/v1.8/connectors/external-ingestion-deployment.md" /%}

Requirements

Clickhouse user must grant SELECT privilege on system.* and schema/tables to fetch the metadata of tables and views.

CREATE USER <username> IDENTIFIED WITH sha256_password BY <password>
-- Grant SELECT and SHOW to that user
-- More details on permissions can be found here at https://clickhouse.com/docs/en/sql-reference/statements/grant
GRANT SELECT, SHOW ON system.* to <username>;
GRANT SELECT ON <schema_name>.* to <username>;

Profiler & Data Quality

Executing the profiler workflow or data quality tests, will require the user to have SELECT permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found here and data quality tests here.

Usage & Lineage

For the usage and lineage workflow, the user will need SELECT privilege. You can find more information on the usage workflow here and the lineage workflow here.

Python Requirements

{% partial file="/v1.8/connectors/python-requirements.md" /%}

To run the Clickhouse ingestion, you will need to install:

pip3 install "openmetadata-ingestion[clickhouse]"

If you want to run the Usage Connector, you'll also need to install:

pip3 install "openmetadata-ingestion[clickhouse-usage]"

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Clickhouse.

In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following JSON Schema

1. Define the YAML Config

This is a sample config for Clickhouse:

{% codePreview %}

{% codeInfoContainer %}

Source Configuration - Service Connection

{% codeInfo srNumber=1 %}

username: Specify the User to connect to Clickhouse. It should have enough privileges to read all the metadata.

{% /codeInfo %}

{% codeInfo srNumber=2 %}

password: Password to connect to Clickhouse.

{% /codeInfo %}

{% codeInfo srNumber=3 %}

hostPort: Enter the fully qualified hostname and port number for your Clickhouse deployment in the Host and Port field.

{% /codeInfo %}

{% codeInfo srNumber=4 %}

databaseSchema: databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema.

{% /codeInfo %}

{% codeInfo srNumber=5 %}

duration: The duration of a SQL connection in ClickHouse depends on the configuration of the connection and the workload being processed. Connections are kept open for as long as needed to complete a query, but they can also be closed based on duration set.

{% /codeInfo %}

{% codeInfo srNumber=6 %}

scheme: There are 2 types of schemes that the user can choose from.

  • clickhouse+http: Uses ClickHouse's HTTP interface for communication. Widely supported, but slower than native.
  • clickhouse+native: Uses the native ClickHouse TCP protocol for communication. Faster than http, but may require additional server-side configuration. Recommended for performance-critical applications.

{% /codeInfo %}

{% codeInfo srNumber=35 %}

https: Enable this flag when the when the Clickhouse instance is hosted via HTTPS protocol. This flag is useful when you are using clickhouse+http connection scheme.

{% /codeInfo %}

{% codeInfo srNumber=36 %}

secure: Establish secure connection with ClickHouse. ClickHouse supports secure communication over SSL/TLS to protect data in transit, by checking this option, it establishes secure connection with ClickHouse. This flag is useful when you are using clickhouse+native connection scheme.

{% /codeInfo %}

{% codeInfo srNumber=37 %}

keyfile: The key file path is the location when ClickHouse looks for a file containing the private key needed for secure communication over SSL/TLS. By default, ClickHouse will look for the key file in the /etc/clickhouse-server directory, with the file name server.key. However, this can be customized in the ClickHouse configuration file (config.xml). This flag is useful when you are using clickhouse+native connection scheme and the secure connection flag is enabled.

{% /codeInfo %}

{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}

{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}

{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}

Advanced Configuration

{% codeInfo srNumber=7 %}

Connection Options (Optional): Enter the details for any additional connection options that can be sent to database during the connection. These details must be added as Key-Value pairs.

{% /codeInfo %}

{% codeInfo srNumber=8 %}

Connection Arguments (Optional): Enter the details for any additional connection arguments such as security or protocol configs that can be sent to database during the connection. These details must be added as Key-Value pairs.

  • In case you are using Single-Sign-On (SSO) for authentication, add the authenticator details in the Connection Arguments as a Key-Value pair as follows: "authenticator" : "sso_login_url"

{% /codeInfo %}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}

source:
  type: clickhouse
  serviceName: local_clickhouse
  serviceConnection:
    config:
      type: Clickhouse
      username: <username>
      password: <password>
      hostPort: <hostPort>
      # databaseSchema: schema
      # duration: 3600
      # scheme: clickhouse+http (default), or clickhouse+native
      # https: false
      # secure: true
      # keyfile: /etc/clickhouse-server/server.key
      # connectionOptions:
      #   key: value
      # connectionArguments:
      #   key: value

{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}

{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}

{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}

{% /codeBlock %}

{% /codePreview %}

{% partial file="/v1.8/connectors/yaml/ingestion-cli.md" /%}

{% partial file="/v1.8/connectors/yaml/query-usage.md" variables={connector: "clickhouse"} /%}

{% partial file="/v1.8/connectors/yaml/lineage.md" variables={connector: "clickhouse"} /%}

{% partial file="/v1.8/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%}

{% partial file="/v1.8/connectors/yaml/auto-classification.md" variables={connector: "clickhouse"} /%}

{% partial file="/v1.8/connectors/yaml/data-quality.md" /%}

dbt Integration

You can learn more about how to ingest dbt models' definitions and their lineage here.