
* fix: reverse metadata collate specific content - collateContent * fix: updated 1.8 docs collateContent * refactor: added snowflake owner management in docs * fix: use reverse-metadata file relative path * fix: reverse metadata file path * fix: yaml.md files - used relative path
8.8 KiB
title | slug |
---|---|
Run the Clickhouse Connector Externally | /connectors/database/clickhouse/yaml |
{% connectorDetailsHeader name="Clickhouse" stage="PROD" platform="OpenMetadata" availableFeatures=["Metadata", "Query Usage", "Lineage", "Column-level Lineage", "Data Profiler", "Data Quality", "dbt", "Sample Data", "Reverse Metadata (Collate Only)"] unavailableFeatures=["Owners", "Tags", "Stored Procedures"] / %}
In this section, we provide guides and references to use the Clickhouse connector.
Configure and schedule Clickhouse metadata and profiler workflows from the OpenMetadata UI:
- Requirements
- Metadata Ingestion
- Query Usage
- Lineage
- Data Profiler
- Data Quality
- dbt Integration {% collateContent %}
- Reverse Metadata {% /collateContent %}
{% partial file="/v1.8/connectors/external-ingestion-deployment.md" /%}
Requirements
Clickhouse user must grant SELECT
privilege on system.*
and schema/tables to fetch the metadata of tables and views.
- Create a new user
- More details https://clickhouse.com/docs/en/sql-reference/statements/create/user
CREATE USER <username> IDENTIFIED WITH sha256_password BY <password>
- Grant Permissions
- More details on permissions can be found here at https://clickhouse.com/docs/en/sql-reference/statements/grant
-- Grant SELECT and SHOW to that user
-- More details on permissions can be found here at https://clickhouse.com/docs/en/sql-reference/statements/grant
GRANT SELECT, SHOW ON system.* to <username>;
GRANT SELECT ON <schema_name>.* to <username>;
Profiler & Data Quality
Executing the profiler workflow or data quality tests, will require the user to have SELECT
permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found here and data quality tests here.
Usage & Lineage
For the usage and lineage workflow, the user will need SELECT
privilege. You can find more information on the usage workflow here and the lineage workflow here.
Python Requirements
{% partial file="/v1.8/connectors/python-requirements.md" /%}
To run the Clickhouse ingestion, you will need to install:
pip3 install "openmetadata-ingestion[clickhouse]"
If you want to run the Usage Connector, you'll also need to install:
pip3 install "openmetadata-ingestion[clickhouse-usage]"
Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Clickhouse.
In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following JSON Schema
1. Define the YAML Config
This is a sample config for Clickhouse:
{% codePreview %}
{% codeInfoContainer %}
Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
username: Specify the User to connect to Clickhouse. It should have enough privileges to read all the metadata.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
password: Password to connect to Clickhouse.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
hostPort: Enter the fully qualified hostname and port number for your Clickhouse deployment in the Host and Port field.
{% /codeInfo %}
{% codeInfo srNumber=4 %}
databaseSchema: databaseSchema of the data source. This is optional parameter, if you would like to restrict the metadata reading to a single databaseSchema. When left blank, OpenMetadata Ingestion attempts to scan all the databaseSchema.
{% /codeInfo %}
{% codeInfo srNumber=5 %}
duration: The duration of a SQL connection in ClickHouse depends on the configuration of the connection and the workload being processed. Connections are kept open for as long as needed to complete a query, but they can also be closed based on duration set.
{% /codeInfo %}
{% codeInfo srNumber=6 %}
scheme: There are 2 types of schemes that the user can choose from.
- clickhouse+http: Uses ClickHouse's HTTP interface for communication. Widely supported, but slower than native.
- clickhouse+native: Uses the native ClickHouse TCP protocol for communication. Faster than http, but may require additional server-side configuration. Recommended for performance-critical applications.
{% /codeInfo %}
{% codeInfo srNumber=35 %}
https: Enable this flag when the when the Clickhouse instance is hosted via HTTPS protocol. This flag is useful when you are using clickhouse+http
connection scheme.
{% /codeInfo %}
{% codeInfo srNumber=36 %}
secure: Establish secure connection with ClickHouse. ClickHouse supports secure communication over SSL/TLS to protect data in transit, by checking this option, it establishes secure connection with ClickHouse. This flag is useful when you are using clickhouse+native
connection scheme.
{% /codeInfo %}
{% codeInfo srNumber=37 %}
keyfile: The key file path is the location when ClickHouse looks for a file containing the private key needed for secure communication over SSL/TLS. By default, ClickHouse will look for the key file in the /etc/clickhouse-server directory
, with the file name server.key
. However, this can be customized in the ClickHouse configuration file (config.xml
). This flag is useful when you are using clickhouse+native
connection scheme and the secure connection flag is enabled.
{% /codeInfo %}
{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}
{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}
{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}
Advanced Configuration
{% codeInfo srNumber=7 %}
Connection Options (Optional): Enter the details for any additional connection options that can be sent to database during the connection. These details must be added as Key-Value pairs.
{% /codeInfo %}
{% codeInfo srNumber=8 %}
Connection Arguments (Optional): Enter the details for any additional connection arguments such as security or protocol configs that can be sent to database during the connection. These details must be added as Key-Value pairs.
- In case you are using Single-Sign-On (SSO) for authentication, add the
authenticator
details in the Connection Arguments as a Key-Value pair as follows:"authenticator" : "sso_login_url"
{% /codeInfo %}
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
source:
type: clickhouse
serviceName: local_clickhouse
serviceConnection:
config:
type: Clickhouse
username: <username>
password: <password>
hostPort: <hostPort>
# databaseSchema: schema
# duration: 3600
# scheme: clickhouse+http (default), or clickhouse+native
# https: false
# secure: true
# keyfile: /etc/clickhouse-server/server.key
# connectionOptions:
# key: value
# connectionArguments:
# key: value
{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}
{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}
{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}
{% /codeBlock %}
{% /codePreview %}
{% partial file="/v1.8/connectors/yaml/ingestion-cli.md" /%}
{% partial file="/v1.8/connectors/yaml/query-usage.md" variables={connector: "clickhouse"} /%}
{% partial file="/v1.8/connectors/yaml/lineage.md" variables={connector: "clickhouse"} /%}
{% partial file="/v1.8/connectors/yaml/data-profiler.md" variables={connector: "clickhouse"} /%}
{% partial file="/v1.8/connectors/yaml/auto-classification.md" variables={connector: "clickhouse"} /%}
{% partial file="/v1.8/connectors/yaml/data-quality.md" /%}
dbt Integration
You can learn more about how to ingest dbt models' definitions and their lineage here.