Pere Miquel Brull 34fbe5d64c
Docs - Prepare 1.7 docs and 1.8 snapshot (#20882)
* DOCS - Prepare 1.7 Release and 1.8 SNAPSHOT

* DOCS - Prepare 1.7 Release and 1.8 SNAPSHOT
2025-04-18 12:12:17 +05:30

7.2 KiB

title slug
Run the Hive Connector Externally /connectors/database/hive/yaml

{% connectorDetailsHeader name="Hive" stage="PROD" platform="OpenMetadata" availableFeatures=["Metadata", "Data Profiler", "Data Quality", "View Lineage", "View Column-level Lineage", "dbt", "Sample Data"] unavailableFeatures=["Query Usage", "Owners", "Tags", "Stored Procedures"] / %}

In this section, we provide guides and references to use the Hive connector.

Configure and schedule Hive metadata and profiler workflows from the OpenMetadata UI:

{% partial file="/v1.7/connectors/external-ingestion-deployment.md" /%}

Requirements

Python Requirements

{% partial file="/v1.7/connectors/python-requirements.md" /%}

To run the Hive ingestion, you will need to install:

pip3 install "openmetadata-ingestion[hive]"

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Hive.

In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server.

The workflow is modeled around the following JSON Schema

1. Define the YAML Config

This is a sample config for Hive:

{% codePreview %}

{% codeInfoContainer %}

Source Configuration - Service Connection

{% codeInfo srNumber=1 %}

username: Specify the User to connect to Hive. It should have enough privileges to read all the metadata.

{% /codeInfo %}

{% codeInfo srNumber=2 %}

password: Password to connect to Hive.

{% /codeInfo %}

{% codeInfo srNumber=3 %}

hostPort: Enter the fully qualified hostname and port number for your Hive deployment in the Host and Port field.

{% /codeInfo %}

{% codeInfo srNumber=4 %}

authOptions: Enter the auth options string for hive connection.

{% /codeInfo %}

{% codeInfo srNumber=22 %}

For MySQL Metastore Connection:

You can also ingest the metadata using Mysql metastore. This step is optional if metastore details are not provided then we will query the hive server directly.

  • username: Specify the User to connect to MySQL Metastore. It should have enough privileges to read all the metadata.
  • password: Password to connect to MySQL.
  • hostPort: Enter the fully qualified hostname and port number for your MySQL Metastore deployment in the Host and Port field in the format hostname:port.
  • databaseSchema: Enter the database schema which is associated with the metastore.

{% /codeInfo %}

{% codeInfo srNumber=3 %}

For Postgres Metastore Connection:

You can also ingest the metadata using Postgres metastore. This step is optional if metastore details are not provided then we will query the hive server directly.

  • username: Specify the User to connect to Postgres Metastore. It should have enough privileges to read all the metadata.
  • password: Password to connect to Postgres.
  • hostPort: Enter the fully qualified hostname and port number for your Postgres deployment in the Host and Port field in the format hostname:port.
  • database: Initial Postgres database to connect to. Specify the name of database associated with metastore instance.

{% /codeInfo %}

{% partial file="/v1.7/connectors/yaml/database/source-config-def.md" /%}

{% partial file="/v1.7/connectors/yaml/ingestion-sink-def.md" /%}

{% partial file="/v1.7/connectors/yaml/workflow-config-def.md" /%}

Advanced Configuration

{% codeInfo srNumber=5 %}

Connection Options (Optional): Enter the details for any additional connection options that can be sent to database during the connection. These details must be added as Key-Value pairs.

{% /codeInfo %}

{% codeInfo srNumber=6 %}

Connection Arguments (Optional): Enter the details for any additional connection arguments such as security or protocol configs that can be sent to database during the connection. These details must be added as Key-Value pairs.

  • In case you are using Single-Sign-On (SSO) for authentication, add the authenticator details in the Connection Arguments as a Key-Value pair as follows: "authenticator" : "sso_login_url"

{% /codeInfo %}

{% /codeInfoContainer %}

{% codeBlock fileName="filename.yaml" %}

source:
  type: hive
  serviceName: local_hive
  serviceConnection:
    config:
      type: Hive
      username: <username>
      password: <password>
      authOptions: <auth options>
      hostPort: <hive connection host & port>
      # For MySQL Metastore Connection
      # metastoreConnection:
      #   type: Mysql
      #   username: <username>
      #   authType:
      #     password: <password>
      #   hostPort: <hostPort>
      #   databaseSchema: metastore

      # For Postgres Metastore Connection
      # metastoreConnection:
      #   type: Postgres
      #   username: <username>
      #   authType:
      #     password: <password>
      #   hostPort: <hostPort>
      #   database: metastore
      # connectionOptions:
      #   key: value
      # connectionArguments:
      #   key: value

{% partial file="/v1.7/connectors/yaml/database/source-config.md" /%}

{% partial file="/v1.7/connectors/yaml/ingestion-sink.md" /%}

{% partial file="/v1.7/connectors/yaml/workflow-config.md" /%} {% /codeBlock %}

{% /codePreview %}

{% partial file="/v1.7/connectors/yaml/ingestion-cli.md" /%}

{% partial file="/v1.7/connectors/yaml/lineage.md" variables={connector: "hive"} /%}

{% partial file="/v1.7/connectors/yaml/data-profiler.md" variables={connector: "hive"} /%}

{% partial file="/v1.7/connectors/yaml/auto-classification.md" variables={connector: "hive"} /%}

{% partial file="/v1.7/connectors/yaml/data-quality.md" /%}

Securing Hive Connection with SSL in OpenMetadata

To configure SSL for secure connections between OpenMetadata and a Hive database, you need to add ssl_cert as a key and the path to the CA certificate as its value under connectionArguments. Ensure that the certificate is accessible by the server. If you use a Docker or Kubernetes deployment, update the CA certificate in the Open Metadata server.

    connectionArguments:
        ssl_cert: /path/to/ca/cert

dbt Integration

{% tilesContainer %}

{% tile icon="mediation" title="dbt Integration" description="Learn more about how to ingest dbt models' definitions and their lineage." link="/connectors/ingestion/workflows/dbt" /%}

{% /tilesContainer %}