mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-07-24 01:40:00 +00:00
2.0 KiB
2.0 KiB
description |
---|
This guide will help install Hive connector and run manually |
Hive
{% hint style="info" %} Prerequisites
- Python 3.7 or above
- Library: libsasl2-dev Hive connector uses
pyhive
to connect and fetch metadata. Pyhive has python sasl dependency and which requires libsasl2-dev to be installed. In some cases, you may need to set LD_LIBRARY_PATH to point to where libsasl2-dev is installed. Please check on how to install libsasl2 for your Linux Distro. {% endhint %}
Install from PyPI or Source
{% tabs %} {% tab title="Install Using PyPI" %}
#install hive-sasl library
sudo apt-get install libsasl2-dev
pip install 'openmetadata-ingestion[hive]'
{% endtab %} {% endtabs %}
Configuration
{% code title="hive.json" %}
{
"source": {
"type": "hive",
"config": {
"service_name": "local_hive",
"host_port": "localhost:10000"
}
},
...
{% endcode %}
- service_name - Service Name for this Hive cluster. If you added the Hive cluster through OpenMetadata UI, make sure the service name matches the same.
- filter_pattern - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
Publish to OpenMetadata
Below is the configuration to publish Hive data into the OpenMetadata service.
Add optionally pii
processor and metadata-rest-tables
sink along with metadata-server
config
{% code title="hive.json" %}
{
"source": {
"type": "hive",
"config": {
"service_name": "local_hive",
"host_port": "localhost:10000"
}
},
"sink": {
"type": "metadata-rest",
"config": {}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
},
"cron": {
"minute": "*/5",
"hour": null,
"day": null,
"month": null,
"day_of_week": null
}
}
{% endcode %}