GitBook: [main] 3 pages and one asset modified

This commit is contained in:
Ayush Shah 2021-08-20 14:14:14 +00:00 committed by gitbook-bot
parent d818829416
commit 31a6ae4420
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
4 changed files with 111 additions and 1 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

View File

@ -56,6 +56,7 @@
* [Metadata Ingestion](install/metadata-ingestion/README.md)
* [Ingest Sample Data](install/metadata-ingestion/ingest-sample-data.md)
* [Connectors](install/metadata-ingestion/connectors/README.md)
* [Hive](install/metadata-ingestion/connectors/hive.md)
* [Athena](install/metadata-ingestion/connectors/athena.md)
* [BigQuery](install/metadata-ingestion/connectors/bigquery.md)
* [ElasticSearch](install/metadata-ingestion/connectors/elastic-search.md)

View File

@ -0,0 +1,105 @@
---
description: This guide will help install Hive connector and run manually
---
# Hive
{% hint style="info" %}
**Prerequisites**
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
1. Python 3.7 or above
2. Library: **libsasl2-dev**
{% endhint %}
### Install from PyPI or Source
{% tabs %}
{% tab title="Install Using PyPI" %}
```bash
#install hive-sasl library
sudo apt-get install libsasl2-dev
pip install 'openmetadata-ingestion[hive]'
python -m spacy download en_core_web_sm
```
{% endtab %}
{% tab title="Build from source " %}
```bash
# checkout OpenMetadata
git clone https://github.com/open-metadata/OpenMetadata.git
cd OpenMetadata/ingestion
#install hive-sasl library
sudo apt-get install libsasl2-dev
python3 -m venv env
source env/bin/activate
pip install '.[hive]'
```
{% endtab %}
{% endtabs %}
### Configuration
{% code title="hive.json" %}
```javascript
{
"source": {
"type": "hive",
"config": {
"service_name": "local_hive",
"service_type": "Hive",
"host_port": "localhost:10000"
}
},
...
```
{% endcode %}
1. **service\_name** - Service Name for this Hive cluster. If you added the Hive cluster through OpenMetadata UI, make sure the service name matches the same.
2. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
## Publish to OpenMetadata
Below is the configuration to publish Hive data into the OpenMetadata service.
Add optionally `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
{% code title="hive.json" %}
```javascript
{
"source": {
"type": "hive",
"config": {
"service_name": "local_hive",
"service_type": "Hive",
"host_port": "localhost:10000"
}
},
"processor": {
"type": "pii",
"config": {}
},
"sink": {
"type": "metadata-rest-tables",
"config": {}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
},
"cron": {
"minute": "*/5",
"hour": null,
"day": null,
"month": null,
"day_of_week": null
}
}
```
{% endcode %}

View File

@ -17,15 +17,19 @@ OpenMetadata Github repository can be accessed here [https://github.com/open-met
![./images/fork-github.png](../../.gitbook/assets/fork-github.png)
Create a local clone of your fork
```bash
```bash
git clone https://github.com/<username>/OpenMetadata.git
```
Set a new remote repository that points to the OpenMetadata repository to pull changes from the open source OpenMetadata codebase into your clone
```bash
cd OpenMetadata/
git remote add upstream https://github.com/open-metadata/OpenMetadata.git
git remote -v
```
## Create a branch in your fork
```bash