GitBook: [main] 67 pages and 10 assets modified
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
After Width: | Height: | Size: 111 KiB |
After Width: | Height: | Size: 111 KiB |
@ -1,12 +1,10 @@
|
||||
# Airflow
|
||||
|
||||
We highly recommend using Airflow or similar schedulers to run Metadata Connectors.
|
||||
Below is the sample code example you can refer to integrate with Airflow
|
||||
|
||||
We highly recommend using Airflow or similar schedulers to run Metadata Connectors. Below is the sample code example you can refer to integrate with Airflow
|
||||
|
||||
## Airflow Example for Hive
|
||||
|
||||
```py
|
||||
```python
|
||||
from datetime import timedelta
|
||||
from airflow import DAG
|
||||
|
||||
@ -53,7 +51,7 @@ with DAG(
|
||||
|
||||
we are using a python method like below
|
||||
|
||||
```py
|
||||
```python
|
||||
def metadata_ingestion_workflow():
|
||||
config = load_config_file("examples/workflows/hive.json")
|
||||
workflow = Workflow.create(config)
|
||||
@ -63,6 +61,5 @@ def metadata_ingestion_workflow():
|
||||
workflow.stop()
|
||||
```
|
||||
|
||||
Create a Worfklow instance and pass a hive configuration which will read metadata from Hive
|
||||
and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to [Metadata Connectors](
|
||||
Create a Worfklow instance and pass a hive configuration which will read metadata from Hive and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to \[Metadata Connectors\]\(
|
||||
|
||||
|
@ -5,17 +5,20 @@ description: This guide will help install Athena connector and run manually
|
||||
# Athena
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[athena]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install BigQuery connector and run manually
|
||||
|
||||
# BigQuery
|
||||
|
||||
## BigQuery
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[bigquery]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -38,9 +37,99 @@ pip install '.[bigquery]'
|
||||
## Run Manually
|
||||
|
||||
```bash
|
||||
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
|
||||
export GOOGLE_APPLICATION_CREDENTIALS="$PWD/examples/creds/bigquery-cred.json"
|
||||
metadata ingest -c ./pipelines/bigquery.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="bigquery-creds.json \(boilerplate\)" %}
|
||||
```javascript
|
||||
{
|
||||
"type": "service_account",
|
||||
"project_id": "project_id",
|
||||
"private_key_id": "private_key_id",
|
||||
"private_key": "",
|
||||
"client_email": "gcpuser@project_id.iam.gserviceaccount.com",
|
||||
"client_id": "",
|
||||
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
|
||||
"token_uri": "https://oauth2.googleapis.com/token",
|
||||
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
|
||||
"client_x509_cert_url": ""
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
{% code title="bigquery.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "bigquery",
|
||||
"config": {
|
||||
"project_id": "project-id",
|
||||
"username": "username",
|
||||
"host_port": "https://bigquery.googleapis.com",
|
||||
"service_name": "gcp_bigquery",
|
||||
"service_type": "BigQuery"
|
||||
}
|
||||
},
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
1. **username** - pass the Bigquery username.
|
||||
2. **password** - password for the Bigquery username.
|
||||
3. **service\_name** - Service Name for this Bigquery cluster. If you added the Bigquery cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
5. **database -** Database name from where data is to be fetched.
|
||||
|
||||
### Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Bigquery data into openmetadata
|
||||
|
||||
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="bigquery.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "bigquery",
|
||||
"config": {
|
||||
"project_id": "project-id",
|
||||
"username": "username",
|
||||
"host_port": "https://bigquery.googleapis.com",
|
||||
"service_name": "gcp_bigquery",
|
||||
"service_type": "BigQuery"
|
||||
}
|
||||
},
|
||||
"processor": {
|
||||
"type": "pii",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api"
|
||||
}
|
||||
},
|
||||
"sink": {
|
||||
"type": "metadata-rest-tables",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api"
|
||||
}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install ElasticSearch connector and run manual
|
||||
|
||||
# ElasticSearch
|
||||
|
||||
## ElasticSearch
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -41,5 +39,62 @@ pip install '.[elasticsearch]'
|
||||
metadata ingest -c ./pipelines/metadata_to_es.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="metadata\_to\_es.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "metadata_es",
|
||||
"config": {}
|
||||
},
|
||||
...
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
### Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Elastic Search data into openmetadata
|
||||
|
||||
Add Optional `file` stage and `elasticsearch` bulk\_sink along with `metadata-server` config
|
||||
|
||||
{% code title="metadata\_to\_es.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "metadata_es",
|
||||
"config": {}
|
||||
},
|
||||
"stage": {
|
||||
"type": "file",
|
||||
"config": {
|
||||
"filename": "/tmp/tables.txt"
|
||||
}
|
||||
},
|
||||
"bulk_sink": {
|
||||
"type": "elasticsearch",
|
||||
"config": {
|
||||
"filename": "/tmp/tables.txt",
|
||||
"es_host_port": "localhost",
|
||||
"index_name": "table_search_index"
|
||||
}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install MsSQL connector and run manually
|
||||
|
||||
# MSSQL
|
||||
|
||||
## MSSQL
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[mssql]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -41,7 +40,7 @@ pip install '.[mssql]'
|
||||
metadata ingest -c ./pipelines/mssql.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="mssql.json" %}
|
||||
```javascript
|
||||
@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/mssql.json
|
||||
"database":"catalog_test",
|
||||
"username": "sa",
|
||||
"password": "test!Password",
|
||||
"include_pattern": {
|
||||
"allow": ["catalog_test.*"]
|
||||
"filter_pattern": {
|
||||
"includes": ["catalog_test.*"]
|
||||
}
|
||||
}
|
||||
},
|
||||
@ -68,14 +67,14 @@ metadata ingest -c ./pipelines/mssql.json
|
||||
2. **password** - password for the mssql username.
|
||||
3. **service\_name** - Service Name for this mssql cluster. If you added mssql cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **host\_port** - Hostname and Port number where the service is being initialised.
|
||||
5. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
6. **database** - \_\*\*\_Database name from where data is to be fetched from.
|
||||
5. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||
6. **database** - Database name from where data is to be fetched from.
|
||||
|
||||
## Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish mssql data into openmetadata
|
||||
|
||||
Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="mssql.json" %}
|
||||
```javascript
|
||||
|
@ -4,28 +4,21 @@ description: This guide will help install MySQL connector and run manually
|
||||
|
||||
# MySQL
|
||||
|
||||
## MySQL
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
|
||||
1. Python 3.7 or above
|
||||
2. Create and activate python env
|
||||
|
||||
```bash
|
||||
python3 -m venv env
|
||||
source env/bin/activate
|
||||
```
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[mysql]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -58,7 +51,7 @@ metadata ingest -c ./pipelines/mysql.json
|
||||
"username": "openmetadata_user",
|
||||
"password": "openmetadata_password",
|
||||
"service_name": "local_mysql",
|
||||
"include_pattern": {
|
||||
"filter_pattern": {
|
||||
"deny": ["mysql.*", "information_schema.*"]
|
||||
}
|
||||
}
|
||||
@ -70,13 +63,13 @@ metadata ingest -c ./pipelines/mysql.json
|
||||
1. **username** - pass the MySQL username. We recommend creating a user with read-only permissions to all the databases in your MySQL installation
|
||||
2. **password** - password for the username
|
||||
3. **service\_name** - Service Name for this MySQL cluster. If you added MySQL cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||
|
||||
## Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish MySQL data into openmetadata
|
||||
|
||||
Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="mysql.json" %}
|
||||
```javascript
|
||||
@ -88,7 +81,7 @@ Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
|
||||
"password": "openmetadata_password",
|
||||
"service_name": "local_mysql",
|
||||
"service_type": "MySQL",
|
||||
"include_pattern": {
|
||||
"filter_pattern": {
|
||||
"excludes": ["mysql.*", "information_schema.*"]
|
||||
}
|
||||
}
|
||||
|
@ -5,17 +5,20 @@ description: This guide will help install Oracle connector and run manually
|
||||
# Oracle
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[oracle]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install Postgres connector and run manually
|
||||
|
||||
# Postgres
|
||||
|
||||
## Postgres
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[postgres]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/postgres.json
|
||||
"database": "pagila",
|
||||
"service_name": "local_postgres",
|
||||
"service_type": "POSTGRES",
|
||||
"include_pattern": {
|
||||
"deny": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] }
|
||||
"filter_pattern": {
|
||||
"excludes": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] }
|
||||
}
|
||||
},
|
||||
...
|
||||
@ -66,21 +65,16 @@ metadata ingest -c ./pipelines/postgres.json
|
||||
1. **username** - pass the Postgres username.
|
||||
2. **password** - password for the Postgres username.
|
||||
3. **service\_name** - Service Name for this Postgres cluster. If you added the Postgres cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
5. **database -** Database name from where data is to be fetched.
|
||||
|
||||
### Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish postgres data into openmetadata
|
||||
Below is the configuration to publish Postgres data into openmetadata
|
||||
|
||||
Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="postgres.json" %}
|
||||
```text
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
@ -118,4 +112,5 @@ Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
|
||||
}
|
||||
}
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install Redshift Usage connector and run manua
|
||||
|
||||
# Redshift Usage
|
||||
|
||||
## Redshift Usage
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
@ -41,5 +39,89 @@ pip install '.[redshift-usage]'
|
||||
metadata ingest -c ./pipelines/redshift_usage.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="redshift\_usage.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "redshift-usage",
|
||||
"config": {
|
||||
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"database": "warehouse",
|
||||
"where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
|
||||
"service_name": "aws_redshift",
|
||||
"service_type": "Redshift",
|
||||
"duration": 2
|
||||
}
|
||||
},
|
||||
...
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
|
||||
2. **password** - password for the username
|
||||
3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||
|
||||
## Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Redshift Usage data into openmetadata
|
||||
|
||||
Add optional `query-parser` processor, `table-usage` stage and `metadata-usage` bulk\_sink along with `metadata-server` config
|
||||
|
||||
{% code title="redshift\_usage.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "redshift-usage",
|
||||
"config": {
|
||||
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"database": "warehouse",
|
||||
"where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
|
||||
"service_name": "aws_redshift",
|
||||
"service_type": "Redshift",
|
||||
"duration": 2
|
||||
}
|
||||
},
|
||||
"processor": {
|
||||
"type": "query-parser",
|
||||
"config": {
|
||||
"filter": ""
|
||||
}
|
||||
},
|
||||
"stage": {
|
||||
"type": "table-usage",
|
||||
"config": {
|
||||
"filename": "/tmp/redshift_usage"
|
||||
}
|
||||
},
|
||||
"bulk_sink": {
|
||||
"type": "metadata-usage",
|
||||
"config": {
|
||||
"filename": "/tmp/redshift_usage"
|
||||
}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install Redshift connector and run manually
|
||||
|
||||
# Redshift
|
||||
|
||||
## Redshift
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[redshift]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -41,5 +40,75 @@ pip install '.[redshift]'
|
||||
metadata ingest -c ./pipelines/redshift.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="redshift.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "redshift",
|
||||
"config": {
|
||||
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"database": "warehouse",
|
||||
"service_name": "aws_redshift",
|
||||
"service_type": "Redshift"
|
||||
}
|
||||
},
|
||||
...
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
|
||||
2. **password** - password for the username
|
||||
3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
|
||||
|
||||
## Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Redshift data into openmetadata
|
||||
|
||||
Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="redshift.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "redshift",
|
||||
"config": {
|
||||
"host_port": "cluster.user.region.redshift.amazonaws.com:5439",
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"database": "warehouse",
|
||||
"service_name": "aws_redshift",
|
||||
"service_type": "Redshift"
|
||||
}
|
||||
},
|
||||
"processor": {
|
||||
"type": "pii",
|
||||
"config": {}
|
||||
},
|
||||
"sink": {
|
||||
"type": "metadata-rest-tables",
|
||||
"config": {}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install Snowflake Usage connector and run manu
|
||||
|
||||
# Snowflake Usage
|
||||
|
||||
## Snowflake Usage
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
@ -41,5 +39,89 @@ pip install '.[snowflake-usage]'
|
||||
metadata ingest -c ./pipelines/snowflake_usage.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="snowflake\_usage.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "snowflake-usage",
|
||||
"config": {
|
||||
"host_port": "account.region.service.snowflakecomputing.com",
|
||||
"username": "username",
|
||||
"password": "strong_password",
|
||||
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||
"account": "account_name",
|
||||
"service_name": "snowflake",
|
||||
"service_type": "Snowflake",
|
||||
"duration": 2
|
||||
}
|
||||
},
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
1. **username** - pass the Snowflake username.
|
||||
2. **password** - password for the Snowflake username.
|
||||
3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
5. **database -** Database name from where data is to be fetched.
|
||||
|
||||
### Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Snowflake Usage data into openmetadata
|
||||
|
||||
Add Optional `query-parser` processor, `table-usage` stage and`metadata-usage` bulk\_sink along with `metadata-server` config
|
||||
|
||||
{% code title="snowflake\_usage.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "snowflake-usage",
|
||||
"config": {
|
||||
"host_port": "account.region.service.snowflakecomputing.com",
|
||||
"username": "username",
|
||||
"password": "strong_password",
|
||||
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||
"account": "account_name",
|
||||
"service_name": "snowflake",
|
||||
"service_type": "Snowflake",
|
||||
"duration": 2
|
||||
}
|
||||
},
|
||||
"processor": {
|
||||
"type": "query-parser",
|
||||
"config": {
|
||||
"filter": ""
|
||||
}
|
||||
},
|
||||
"stage": {
|
||||
"type": "table-usage",
|
||||
"config": {
|
||||
"filename": "/tmp/snowflake_usage"
|
||||
}
|
||||
},
|
||||
"bulk_sink": {
|
||||
"type": "metadata-usage",
|
||||
"config": {
|
||||
"filename": "/tmp/snowflake_usage"
|
||||
}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -4,8 +4,6 @@ description: This guide will help install Snowflake connector and run manually
|
||||
|
||||
# Snowflake
|
||||
|
||||
## Snowflake
|
||||
|
||||
{% hint style="info" %}
|
||||
**Prerequisites**
|
||||
|
||||
@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
1. Python 3.7 or above
|
||||
{% endhint %}
|
||||
|
||||
## Install from PyPI or Source
|
||||
### Install from PyPI or Source
|
||||
|
||||
{% tabs %}
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[snowflake]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -41,5 +40,91 @@ pip install '.[snowflake]'
|
||||
metadata ingest -c ./pipelines/snowflake.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
{% code title="snowflake.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "snowflake",
|
||||
"config": {
|
||||
"host_port": "account.region.service.snowflakecomputing.com",
|
||||
"username": "username",
|
||||
"password": "strong_password",
|
||||
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||
"account": "account_name",
|
||||
"service_name": "snowflake",
|
||||
"service_type": "Snowflake",
|
||||
"filter_pattern": {
|
||||
"includes": [
|
||||
"(\\w)*tpcds_sf100tcl",
|
||||
"(\\w)*tpcds_sf100tcl",
|
||||
"(\\w)*tpcds_sf10tcl"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
1. **username** - pass the Snowflake username.
|
||||
2. **password** - password for the Snowflake username.
|
||||
3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
|
||||
4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
|
||||
5. **database -** Database name from where data is to be fetched.
|
||||
|
||||
### Publish to OpenMetadata
|
||||
|
||||
Below is the configuration to publish Snowflake data into openmetadata
|
||||
|
||||
Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
|
||||
|
||||
{% code title="snowflake.json" %}
|
||||
```javascript
|
||||
{
|
||||
"source": {
|
||||
"type": "snowflake",
|
||||
"config": {
|
||||
"host_port": "account.region.service.snowflakecomputing.com",
|
||||
"username": "username",
|
||||
"password": "strong_password",
|
||||
"database": "SNOWFLAKE_SAMPLE_DATA",
|
||||
"account": "account_name",
|
||||
"service_name": "snowflake",
|
||||
"service_type": "Snowflake",
|
||||
"filter_pattern": {
|
||||
"includes": [
|
||||
"(\\w)*tpcds_sf100tcl",
|
||||
"(\\w)*tpcds_sf100tcl",
|
||||
"(\\w)*tpcds_sf10tcl"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"processor": {
|
||||
"type": "pii",
|
||||
"config": {}
|
||||
},
|
||||
"sink": {
|
||||
"type": "metadata-rest-tables",
|
||||
"config": {}
|
||||
},
|
||||
"metadata_server": {
|
||||
"type": "metadata-server",
|
||||
"config": {
|
||||
"api_endpoint": "http://localhost:8585/api",
|
||||
"auth_provider_type": "no-auth"
|
||||
}
|
||||
},
|
||||
"cron": {
|
||||
"minute": "*/5",
|
||||
"hour": null,
|
||||
"day": null,
|
||||
"month": null,
|
||||
"day_of_week": null
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% endcode %}
|
||||
|
||||
|
@ -12,11 +12,6 @@ description: This guide will help you to ingest sample data
|
||||
OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
|
||||
1. Python 3.7 or above
|
||||
2. Create and activate python env
|
||||
|
||||
```bash
|
||||
|
||||
```
|
||||
{% endhint %}
|
||||
|
||||
### Install from PyPI or Source
|
||||
@ -25,6 +20,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
|
||||
{% tab title="Install Using PyPI" %}
|
||||
```bash
|
||||
pip install 'openmetadata-ingestion[sample-tables, elasticsearch]'
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
{% endtab %}
|
||||
|
||||
@ -40,10 +36,11 @@ pip install '.[sample-tables, elasticsearch]'
|
||||
{% endtab %}
|
||||
{% endtabs %}
|
||||
|
||||
### Ingest sample tables and users
|
||||
### Ingest sample tables, usage and users
|
||||
|
||||
```bash
|
||||
metadata ingest -c ./pipelines/sample_tables.json
|
||||
metadata ingest -c ./pipelines/sample_usage.json
|
||||
metadata ingest -c ./pipelines/sample_users.json
|
||||
```
|
||||
|
||||
|