GitBook: [main] 67 pages and 10 assets modified
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
After Width: | Height: | Size: 111 KiB |
After Width: | Height: | Size: 111 KiB |
@ -1,12 +1,10 @@
|
|||||||
# Airflow
|
# Airflow
|
||||||
|
|
||||||
We highly recommend using Airflow or similar schedulers to run Metadata Connectors.
|
We highly recommend using Airflow or similar schedulers to run Metadata Connectors. Below is the sample code example you can refer to integrate with Airflow
|
||||||
Below is the sample code example you can refer to integrate with Airflow
|
|
||||||
|
|
||||||
|
|
||||||
## Airflow Example for Hive
|
## Airflow Example for Hive
|
||||||
|
|
||||||
```py
|
```python
|
||||||
from datetime import timedelta
|
from datetime import timedelta
|
||||||
from airflow import DAG
|
from airflow import DAG
|
||||||
|
|
||||||
@ -53,7 +51,7 @@ with DAG(
|
|||||||
|
|
||||||
we are using a python method like below
|
we are using a python method like below
|
||||||
|
|
||||||
```py
|
```python
|
||||||
def metadata_ingestion_workflow():
|
def metadata_ingestion_workflow():
|
||||||
config = load_config_file("examples/workflows/hive.json")
|
config = load_config_file("examples/workflows/hive.json")
|
||||||
workflow = Workflow.create(config)
|
workflow = Workflow.create(config)
|
||||||
@ -63,6 +61,5 @@ def metadata_ingestion_workflow():
|
|||||||
workflow.stop()
|
workflow.stop()
|
||||||
```
|
```
|
||||||
|
|
||||||
Create a Worfklow instance and pass a hive configuration which will read metadata from Hive
|
Create a Worfklow instance and pass a hive configuration which will read metadata from Hive and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to \[Metadata Connectors\]\(
|
||||||
and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to [Metadata Connectors](
|
|
||||||
|
|
||||||
|
@ -5,17 +5,20 @@ description: This guide will help install Athena connector and run manually
|
|||||||
# Athena
|
# Athena
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
|
**Prerequisites**
|
||||||
|
|
||||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||||
|
|
||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[athena]'
|
pip install 'openmetadata-ingestion[athena]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install BigQuery connector and run manually
|
|||||||
|
|
||||||
# BigQuery
|
# BigQuery
|
||||||
|
|
||||||
## BigQuery
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[bigquery]'
|
pip install 'openmetadata-ingestion[bigquery]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -38,9 +37,99 @@ pip install '.[bigquery]'
|
|||||||
## Run Manually
|
## Run Manually
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
|
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/examples/creds/bigquery-cred.json"
|
||||||
metadata ingest -c ./pipelines/bigquery.json
|
metadata ingest -c ./pipelines/bigquery.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="bigquery-creds.json \(boilerplate\)" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"type": "service_account",
|
||||||
|
"project_id": "project_id",
|
||||||
|
"private_key_id": "private_key_id",
|
||||||
|
"private_key": "",
|
||||||
|
"client_email": "gcpuser@project_id.iam.gserviceaccount.com",
|
||||||
|
"client_id": "",
|
||||||
|
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
|
||||||
|
"token_uri": "https://oauth2.googleapis.com/token",
|
||||||
|
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
|
||||||
|
"client_x509_cert_url": ""
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
{% code title="bigquery.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "bigquery",
|
||||||
|
"config": {
|
||||||
|
"project_id": "project-id",
|
||||||
|
"username": "username",
|
||||||
|
"host_port": "https://bigquery.googleapis.com",
|
||||||
|
"service_name": "gcp_bigquery",
|
||||||
|
"service_type": "BigQuery"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
1. **username** - pass the Bigquery username.
|
||||||
|
2. **password** - password for the Bigquery username.
|
||||||
|
3. **service\_name** - Service Name for this Bigquery cluster. If you added the Bigquery cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||||
|
5. **database -** Database name from where data is to be fetched.
|
||||||
|
|
||||||
|
### Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Bigquery data into openmetadata
|
||||||
|
|
||||||
|
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="bigquery.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "bigquery",
|
||||||
|
"config": {
|
||||||
|
"project_id": "project-id",
|
||||||
|
"username": "username",
|
||||||
|
"host_port": "https://bigquery.googleapis.com",
|
||||||
|
"service_name": "gcp_bigquery",
|
||||||
|
"service_type": "BigQuery"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"processor": {
|
||||||
|
"type": "pii",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sink": {
|
||||||
|
"type": "metadata-rest-tables",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install ElasticSearch connector and run manual
|
|||||||
|
|
||||||
# ElasticSearch
|
# ElasticSearch
|
||||||
|
|
||||||
## ElasticSearch
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -41,5 +39,62 @@ pip install '.[elasticsearch]'
|
|||||||
metadata ingest -c ./pipelines/metadata_to_es.json
|
metadata ingest -c ./pipelines/metadata_to_es.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="metadata\_to\_es.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "metadata_es",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
...
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
### Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Elastic Search data into openmetadata
|
||||||
|
|
||||||
|
Add Optional `file` stage and `elasticsearch` bulk\_sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="metadata\_to\_es.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "metadata_es",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
"stage": {
|
||||||
|
"type": "file",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/tables.txt"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bulk_sink": {
|
||||||
|
"type": "elasticsearch",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/tables.txt",
|
||||||
|
"es_host_port": "localhost",
|
||||||
|
"index_name": "table_search_index"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install MsSQL connector and run manually
|
|||||||
|
|
||||||
# MSSQL
|
# MSSQL
|
||||||
|
|
||||||
## MSSQL
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[mssql]'
|
pip install 'openmetadata-ingestion[mssql]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -41,7 +40,7 @@ pip install '.[mssql]'
|
|||||||
metadata ingest -c ./pipelines/mssql.json
|
metadata ingest -c ./pipelines/mssql.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
{% code title="mssql.json" %}
|
{% code title="mssql.json" %}
|
||||||
```javascript
|
```javascript
|
||||||
@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/mssql.json
|
|||||||
"database":"catalog_test",
|
"database":"catalog_test",
|
||||||
"username": "sa",
|
"username": "sa",
|
||||||
"password": "test!Password",
|
"password": "test!Password",
|
||||||
"include_pattern": {
|
"filter_pattern": {
|
||||||
"allow": ["catalog_test.*"]
|
"includes": ["catalog_test.*"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
@ -68,14 +67,14 @@ metadata ingest -c ./pipelines/mssql.json
|
|||||||
2. **password** - password for the mssql username.
|
2. **password** - password for the mssql username.
|
||||||
3. **service\_name** - Service Name for this mssql cluster. If you added mssql cluster through OpenMetadata UI, make sure the service name matches the same.
|
3. **service\_name** - Service Name for this mssql cluster. If you added mssql cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
4. **host\_port** - Hostname and Port number where the service is being initialised.
|
4. **host\_port** - Hostname and Port number where the service is being initialised.
|
||||||
5. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
5. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||||
6. **database** - \_\*\*\_Database name from where data is to be fetched from.
|
6. **database** - Database name from where data is to be fetched from.
|
||||||
|
|
||||||
## Publish to OpenMetadata
|
## Publish to OpenMetadata
|
||||||
|
|
||||||
Below is the configuration to publish mssql data into openmetadata
|
Below is the configuration to publish mssql data into openmetadata
|
||||||
|
|
||||||
Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
{% code title="mssql.json" %}
|
{% code title="mssql.json" %}
|
||||||
```javascript
|
```javascript
|
||||||
|
@ -4,28 +4,21 @@ description: This guide will help install MySQL connector and run manually
|
|||||||
|
|
||||||
# MySQL
|
# MySQL
|
||||||
|
|
||||||
## MySQL
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||||
|
|
||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
2. Create and activate python env
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python3 -m venv env
|
|
||||||
source env/bin/activate
|
|
||||||
```
|
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[mysql]'
|
pip install 'openmetadata-ingestion[mysql]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -58,7 +51,7 @@ metadata ingest -c ./pipelines/mysql.json
|
|||||||
"username": "openmetadata_user",
|
"username": "openmetadata_user",
|
||||||
"password": "openmetadata_password",
|
"password": "openmetadata_password",
|
||||||
"service_name": "local_mysql",
|
"service_name": "local_mysql",
|
||||||
"include_pattern": {
|
"filter_pattern": {
|
||||||
"deny": ["mysql.*", "information_schema.*"]
|
"deny": ["mysql.*", "information_schema.*"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -70,13 +63,13 @@ metadata ingest -c ./pipelines/mysql.json
|
|||||||
1. **username** - pass the MySQL username. We recommend creating a user with read-only permissions to all the databases in your MySQL installation
|
1. **username** - pass the MySQL username. We recommend creating a user with read-only permissions to all the databases in your MySQL installation
|
||||||
2. **password** - password for the username
|
2. **password** - password for the username
|
||||||
3. **service\_name** - Service Name for this MySQL cluster. If you added MySQL cluster through OpenMetadata UI, make sure the service name matches the same.
|
3. **service\_name** - Service Name for this MySQL cluster. If you added MySQL cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||||
|
|
||||||
## Publish to OpenMetadata
|
## Publish to OpenMetadata
|
||||||
|
|
||||||
Below is the configuration to publish MySQL data into openmetadata
|
Below is the configuration to publish MySQL data into openmetadata
|
||||||
|
|
||||||
Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
{% code title="mysql.json" %}
|
{% code title="mysql.json" %}
|
||||||
```javascript
|
```javascript
|
||||||
@ -88,7 +81,7 @@ Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
|
|||||||
"password": "openmetadata_password",
|
"password": "openmetadata_password",
|
||||||
"service_name": "local_mysql",
|
"service_name": "local_mysql",
|
||||||
"service_type": "MySQL",
|
"service_type": "MySQL",
|
||||||
"include_pattern": {
|
"filter_pattern": {
|
||||||
"excludes": ["mysql.*", "information_schema.*"]
|
"excludes": ["mysql.*", "information_schema.*"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -5,17 +5,20 @@ description: This guide will help install Oracle connector and run manually
|
|||||||
# Oracle
|
# Oracle
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
|
**Prerequisites**
|
||||||
|
|
||||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||||
|
|
||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[oracle]'
|
pip install 'openmetadata-ingestion[oracle]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install Postgres connector and run manually
|
|||||||
|
|
||||||
# Postgres
|
# Postgres
|
||||||
|
|
||||||
## Postgres
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[postgres]'
|
pip install 'openmetadata-ingestion[postgres]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/postgres.json
|
|||||||
"database": "pagila",
|
"database": "pagila",
|
||||||
"service_name": "local_postgres",
|
"service_name": "local_postgres",
|
||||||
"service_type": "POSTGRES",
|
"service_type": "POSTGRES",
|
||||||
"include_pattern": {
|
"filter_pattern": {
|
||||||
"deny": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] }
|
"excludes": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] }
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
...
|
...
|
||||||
@ -66,21 +65,16 @@ metadata ingest -c ./pipelines/postgres.json
|
|||||||
1. **username** - pass the Postgres username.
|
1. **username** - pass the Postgres username.
|
||||||
2. **password** - password for the Postgres username.
|
2. **password** - password for the Postgres username.
|
||||||
3. **service\_name** - Service Name for this Postgres cluster. If you added the Postgres cluster through OpenMetadata UI, make sure the service name matches the same.
|
3. **service\_name** - Service Name for this Postgres cluster. If you added the Postgres cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||||
5. **database -** Database name from where data is to be fetched.
|
5. **database -** Database name from where data is to be fetched.
|
||||||
|
|
||||||
### Publish to OpenMetadata
|
### Publish to OpenMetadata
|
||||||
|
|
||||||
Below is the configuration to publish postgres data into openmetadata
|
Below is the configuration to publish Postgres data into openmetadata
|
||||||
|
|
||||||
Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
{% code title="postgres.json" %}
|
{% code title="postgres.json" %}
|
||||||
```text
|
|
||||||
|
|
||||||
```
|
|
||||||
{% endcode %}
|
|
||||||
|
|
||||||
```javascript
|
```javascript
|
||||||
{
|
{
|
||||||
"source": {
|
"source": {
|
||||||
@ -118,4 +112,5 @@ Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install Redshift Usage connector and run manua
|
|||||||
|
|
||||||
# Redshift Usage
|
# Redshift Usage
|
||||||
|
|
||||||
## Redshift Usage
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
@ -41,5 +39,89 @@ pip install '.[redshift-usage]'
|
|||||||
metadata ingest -c ./pipelines/redshift_usage.json
|
metadata ingest -c ./pipelines/redshift_usage.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="redshift\_usage.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "redshift-usage",
|
||||||
|
"config": {
|
||||||
|
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||||
|
"username": "username",
|
||||||
|
"password": "password",
|
||||||
|
"database": "warehouse",
|
||||||
|
"where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
|
||||||
|
"service_name": "aws_redshift",
|
||||||
|
"service_type": "Redshift",
|
||||||
|
"duration": 2
|
||||||
|
}
|
||||||
|
},
|
||||||
|
...
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
|
||||||
|
2. **password** - password for the username
|
||||||
|
3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||||
|
|
||||||
|
## Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Redshift Usage data into openmetadata
|
||||||
|
|
||||||
|
Add optional `query-parser` processor, `table-usage` stage and `metadata-usage` bulk\_sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="redshift\_usage.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "redshift-usage",
|
||||||
|
"config": {
|
||||||
|
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||||
|
"username": "username",
|
||||||
|
"password": "password",
|
||||||
|
"database": "warehouse",
|
||||||
|
"where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
|
||||||
|
"service_name": "aws_redshift",
|
||||||
|
"service_type": "Redshift",
|
||||||
|
"duration": 2
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"processor": {
|
||||||
|
"type": "query-parser",
|
||||||
|
"config": {
|
||||||
|
"filter": ""
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"stage": {
|
||||||
|
"type": "table-usage",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/redshift_usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bulk_sink": {
|
||||||
|
"type": "metadata-usage",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/redshift_usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install Redshift connector and run manually
|
|||||||
|
|
||||||
# Redshift
|
# Redshift
|
||||||
|
|
||||||
## Redshift
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[redshift]'
|
pip install 'openmetadata-ingestion[redshift]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -41,5 +40,75 @@ pip install '.[redshift]'
|
|||||||
metadata ingest -c ./pipelines/redshift.json
|
metadata ingest -c ./pipelines/redshift.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="redshift.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "redshift",
|
||||||
|
"config": {
|
||||||
|
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||||
|
"username": "username",
|
||||||
|
"password": "password",
|
||||||
|
"database": "warehouse",
|
||||||
|
"service_name": "aws_redshift",
|
||||||
|
"service_type": "Redshift"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
...
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
|
||||||
|
2. **password** - password for the username
|
||||||
|
3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||||
|
|
||||||
|
## Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Redshift data into openmetadata
|
||||||
|
|
||||||
|
Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="redshift.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "redshift",
|
||||||
|
"config": {
|
||||||
|
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||||
|
"username": "username",
|
||||||
|
"password": "password",
|
||||||
|
"database": "warehouse",
|
||||||
|
"service_name": "aws_redshift",
|
||||||
|
"service_type": "Redshift"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"processor": {
|
||||||
|
"type": "pii",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
"sink": {
|
||||||
|
"type": "metadata-rest-tables",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install Snowflake Usage connector and run manu
|
|||||||
|
|
||||||
# Snowflake Usage
|
# Snowflake Usage
|
||||||
|
|
||||||
## Snowflake Usage
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
@ -41,5 +39,89 @@ pip install '.[snowflake-usage]'
|
|||||||
metadata ingest -c ./pipelines/snowflake_usage.json
|
metadata ingest -c ./pipelines/snowflake_usage.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="snowflake\_usage.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "snowflake-usage",
|
||||||
|
"config": {
|
||||||
|
"host_port": "account.region.service.snowflakecomputing.com",
|
||||||
|
"username": "username",
|
||||||
|
"password": "strong_password",
|
||||||
|
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||||
|
"account": "account_name",
|
||||||
|
"service_name": "snowflake",
|
||||||
|
"service_type": "Snowflake",
|
||||||
|
"duration": 2
|
||||||
|
}
|
||||||
|
},
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
1. **username** - pass the Snowflake username.
|
||||||
|
2. **password** - password for the Snowflake username.
|
||||||
|
3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||||
|
5. **database -** Database name from where data is to be fetched.
|
||||||
|
|
||||||
|
### Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Snowflake Usage data into openmetadata
|
||||||
|
|
||||||
|
Add Optional `query-parser` processor, `table-usage` stage and`metadata-usage` bulk\_sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="snowflake\_usage.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "snowflake-usage",
|
||||||
|
"config": {
|
||||||
|
"host_port": "account.region.service.snowflakecomputing.com",
|
||||||
|
"username": "username",
|
||||||
|
"password": "strong_password",
|
||||||
|
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||||
|
"account": "account_name",
|
||||||
|
"service_name": "snowflake",
|
||||||
|
"service_type": "Snowflake",
|
||||||
|
"duration": 2
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"processor": {
|
||||||
|
"type": "query-parser",
|
||||||
|
"config": {
|
||||||
|
"filter": ""
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"stage": {
|
||||||
|
"type": "table-usage",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/snowflake_usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bulk_sink": {
|
||||||
|
"type": "metadata-usage",
|
||||||
|
"config": {
|
||||||
|
"filename": "/tmp/snowflake_usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -4,8 +4,6 @@ description: This guide will help install Snowflake connector and run manually
|
|||||||
|
|
||||||
# Snowflake
|
# Snowflake
|
||||||
|
|
||||||
## Snowflake
|
|
||||||
|
|
||||||
{% hint style="info" %}
|
{% hint style="info" %}
|
||||||
**Prerequisites**
|
**Prerequisites**
|
||||||
|
|
||||||
@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
## Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
|
|
||||||
{% tabs %}
|
{% tabs %}
|
||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[snowflake]'
|
pip install 'openmetadata-ingestion[snowflake]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -41,5 +40,91 @@ pip install '.[snowflake]'
|
|||||||
metadata ingest -c ./pipelines/snowflake.json
|
metadata ingest -c ./pipelines/snowflake.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
### Configuration
|
||||||
|
|
||||||
|
{% code title="snowflake.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "snowflake",
|
||||||
|
"config": {
|
||||||
|
"host_port": "account.region.service.snowflakecomputing.com",
|
||||||
|
"username": "username",
|
||||||
|
"password": "strong_password",
|
||||||
|
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||||
|
"account": "account_name",
|
||||||
|
"service_name": "snowflake",
|
||||||
|
"service_type": "Snowflake",
|
||||||
|
"filter_pattern": {
|
||||||
|
"includes": [
|
||||||
|
"(\\w)*tpcds_sf100tcl",
|
||||||
|
"(\\w)*tpcds_sf100tcl",
|
||||||
|
"(\\w)*tpcds_sf10tcl"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
1. **username** - pass the Snowflake username.
|
||||||
|
2. **password** - password for the Snowflake username.
|
||||||
|
3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||||
|
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||||
|
5. **database -** Database name from where data is to be fetched.
|
||||||
|
|
||||||
|
### Publish to OpenMetadata
|
||||||
|
|
||||||
|
Below is the configuration to publish Snowflake data into openmetadata
|
||||||
|
|
||||||
|
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||||
|
|
||||||
|
{% code title="snowflake.json" %}
|
||||||
|
```javascript
|
||||||
|
{
|
||||||
|
"source": {
|
||||||
|
"type": "snowflake",
|
||||||
|
"config": {
|
||||||
|
"host_port": "account.region.service.snowflakecomputing.com",
|
||||||
|
"username": "username",
|
||||||
|
"password": "strong_password",
|
||||||
|
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||||
|
"account": "account_name",
|
||||||
|
"service_name": "snowflake",
|
||||||
|
"service_type": "Snowflake",
|
||||||
|
"filter_pattern": {
|
||||||
|
"includes": [
|
||||||
|
"(\\w)*tpcds_sf100tcl",
|
||||||
|
"(\\w)*tpcds_sf100tcl",
|
||||||
|
"(\\w)*tpcds_sf10tcl"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"processor": {
|
||||||
|
"type": "pii",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
"sink": {
|
||||||
|
"type": "metadata-rest-tables",
|
||||||
|
"config": {}
|
||||||
|
},
|
||||||
|
"metadata_server": {
|
||||||
|
"type": "metadata-server",
|
||||||
|
"config": {
|
||||||
|
"api_endpoint": "http://localhost:8585/api",
|
||||||
|
"auth_provider_type": "no-auth"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cron": {
|
||||||
|
"minute": "*/5",
|
||||||
|
"hour": null,
|
||||||
|
"day": null,
|
||||||
|
"month": null,
|
||||||
|
"day_of_week": null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
{% endcode %}
|
||||||
|
|
||||||
|
@ -12,11 +12,6 @@ description: This guide will help you to ingest sample data
|
|||||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||||
|
|
||||||
1. Python 3.7 or above
|
1. Python 3.7 or above
|
||||||
2. Create and activate python env
|
|
||||||
|
|
||||||
```bash
|
|
||||||
|
|
||||||
```
|
|
||||||
{% endhint %}
|
{% endhint %}
|
||||||
|
|
||||||
### Install from PyPI or Source
|
### Install from PyPI or Source
|
||||||
@ -25,6 +20,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
|||||||
{% tab title="Install Using PyPI" %}
|
{% tab title="Install Using PyPI" %}
|
||||||
```bash
|
```bash
|
||||||
pip install 'openmetadata-ingestion[sample-tables, elasticsearch]'
|
pip install 'openmetadata-ingestion[sample-tables, elasticsearch]'
|
||||||
|
python -m spacy download en_core_web_sm
|
||||||
```
|
```
|
||||||
{% endtab %}
|
{% endtab %}
|
||||||
|
|
||||||
@ -40,10 +36,11 @@ pip install '.[sample-tables, elasticsearch]'
|
|||||||
{% endtab %}
|
{% endtab %}
|
||||||
{% endtabs %}
|
{% endtabs %}
|
||||||
|
|
||||||
### Ingest sample tables and users
|
### Ingest sample tables, usage and users
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
metadata ingest -c ./pipelines/sample_tables.json
|
metadata ingest -c ./pipelines/sample_tables.json
|
||||||
|
metadata ingest -c ./pipelines/sample_usage.json
|
||||||
metadata ingest -c ./pipelines/sample_users.json
|
metadata ingest -c ./pipelines/sample_users.json
|
||||||
```
|
```
|
||||||
|
|
||||||
|