diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (2).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (1).png similarity index 100% rename from docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (2).png rename to docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (1).png diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (4).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (2).png similarity index 100% rename from docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (4).png rename to docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (2).png diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (5).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (3).png similarity index 100% rename from docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (5).png rename to docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (3).png diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (4).png similarity index 100% rename from docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3).png rename to docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (4).png diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (5).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (5).png new file mode 100644 index 00000000000..6032ddb871c Binary files /dev/null and b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (5).png differ diff --git a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (6).png b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (6).png new file mode 100644 index 00000000000..6032ddb871c Binary files /dev/null and b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17 (1) (2) (2) (2) (3) (4) (4) (5) (3) (1) (6).png differ diff --git a/docs/install/metadata-ingestion/airflow.md b/docs/install/metadata-ingestion/airflow.md index 7c631bc01c0..fd5bb7d467d 100644 --- a/docs/install/metadata-ingestion/airflow.md +++ b/docs/install/metadata-ingestion/airflow.md @@ -1,12 +1,10 @@ # Airflow -We highly recommend using Airflow or similar schedulers to run Metadata Connectors. -Below is the sample code example you can refer to integrate with Airflow - +We highly recommend using Airflow or similar schedulers to run Metadata Connectors. Below is the sample code example you can refer to integrate with Airflow ## Airflow Example for Hive -```py +```python from datetime import timedelta from airflow import DAG @@ -53,7 +51,7 @@ with DAG( we are using a python method like below -```py +```python def metadata_ingestion_workflow(): config = load_config_file("examples/workflows/hive.json") workflow = Workflow.create(config) diff --git a/docs/install/metadata-ingestion/connectors/athena.md b/docs/install/metadata-ingestion/connectors/athena.md index 0cc278532af..10456e19b98 100644 --- a/docs/install/metadata-ingestion/connectors/athena.md +++ b/docs/install/metadata-ingestion/connectors/athena.md @@ -5,17 +5,20 @@ description: This guide will help install Athena connector and run manually # Athena {% hint style="info" %} +**Prerequisites** + OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[athena]' +python -m spacy download en_core_web_sm ``` {% endtab %} diff --git a/docs/install/metadata-ingestion/connectors/bigquery.md b/docs/install/metadata-ingestion/connectors/bigquery.md index e830146a8ea..792dc1e23c2 100644 --- a/docs/install/metadata-ingestion/connectors/bigquery.md +++ b/docs/install/metadata-ingestion/connectors/bigquery.md @@ -4,8 +4,6 @@ description: This guide will help install BigQuery connector and run manually # BigQuery -## BigQuery - {% hint style="info" %} **Prerequisites** @@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[bigquery]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -38,9 +37,99 @@ pip install '.[bigquery]' ## Run Manually ```bash -export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json" +export GOOGLE_APPLICATION_CREDENTIALS="$PWD/examples/creds/bigquery-cred.json" metadata ingest -c ./pipelines/bigquery.json ``` -## Configuration +### Configuration + +{% code title="bigquery-creds.json \(boilerplate\)" %} +```javascript +{ + "type": "service_account", + "project_id": "project_id", + "private_key_id": "private_key_id", + "private_key": "", + "client_email": "gcpuser@project_id.iam.gserviceaccount.com", + "client_id": "", + "auth_uri": "https://accounts.google.com/o/oauth2/auth", + "token_uri": "https://oauth2.googleapis.com/token", + "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", + "client_x509_cert_url": "" +} + +``` +{% endcode %} + +{% code title="bigquery.json" %} +```javascript +{ + "source": { + "type": "bigquery", + "config": { + "project_id": "project-id", + "username": "username", + "host_port": "https://bigquery.googleapis.com", + "service_name": "gcp_bigquery", + "service_type": "BigQuery" + } + }, +``` +{% endcode %} + +1. **username** - pass the Bigquery username. +2. **password** - password for the Bigquery username. +3. **service\_name** - Service Name for this Bigquery cluster. If you added the Bigquery cluster through OpenMetadata UI, make sure the service name matches the same. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata. +5. **database -** Database name from where data is to be fetched. + +### Publish to OpenMetadata + +Below is the configuration to publish Bigquery data into openmetadata + +Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config + +{% code title="bigquery.json" %} +```javascript +{ + "source": { + "type": "bigquery", + "config": { + "project_id": "project-id", + "username": "username", + "host_port": "https://bigquery.googleapis.com", + "service_name": "gcp_bigquery", + "service_type": "BigQuery" + } + }, + "processor": { + "type": "pii", + "config": { + "api_endpoint": "http://localhost:8585/api" + } + }, + "sink": { + "type": "metadata-rest-tables", + "config": { + "api_endpoint": "http://localhost:8585/api" + } + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/elastic-search.md b/docs/install/metadata-ingestion/connectors/elastic-search.md index 5400580cef4..45071285797 100644 --- a/docs/install/metadata-ingestion/connectors/elastic-search.md +++ b/docs/install/metadata-ingestion/connectors/elastic-search.md @@ -4,8 +4,6 @@ description: This guide will help install ElasticSearch connector and run manual # ElasticSearch -## ElasticSearch - {% hint style="info" %} **Prerequisites** @@ -41,5 +39,62 @@ pip install '.[elasticsearch]' metadata ingest -c ./pipelines/metadata_to_es.json ``` -## Configuration +### Configuration + +{% code title="metadata\_to\_es.json" %} +```javascript +{ + "source": { + "type": "metadata_es", + "config": {} + }, +... +``` +{% endcode %} + +### Publish to OpenMetadata + +Below is the configuration to publish Elastic Search data into openmetadata + +Add Optional `file` stage and `elasticsearch` bulk\_sink along with `metadata-server` config + +{% code title="metadata\_to\_es.json" %} +```javascript +{ + "source": { + "type": "metadata_es", + "config": {} + }, + "stage": { + "type": "file", + "config": { + "filename": "/tmp/tables.txt" + } + }, + "bulk_sink": { + "type": "elasticsearch", + "config": { + "filename": "/tmp/tables.txt", + "es_host_port": "localhost", + "index_name": "table_search_index" + } + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/mssql.md b/docs/install/metadata-ingestion/connectors/mssql.md index 9f338857223..b085eaa4a84 100644 --- a/docs/install/metadata-ingestion/connectors/mssql.md +++ b/docs/install/metadata-ingestion/connectors/mssql.md @@ -4,8 +4,6 @@ description: This guide will help install MsSQL connector and run manually # MSSQL -## MSSQL - {% hint style="info" %} **Prerequisites** @@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[mssql]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -41,7 +40,7 @@ pip install '.[mssql]' metadata ingest -c ./pipelines/mssql.json ``` -## Configuration +### Configuration {% code title="mssql.json" %} ```javascript @@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/mssql.json "database":"catalog_test", "username": "sa", "password": "test!Password", - "include_pattern": { - "allow": ["catalog_test.*"] + "filter_pattern": { + "includes": ["catalog_test.*"] } } }, @@ -68,14 +67,14 @@ metadata ingest -c ./pipelines/mssql.json 2. **password** - password for the mssql username. 3. **service\_name** - Service Name for this mssql cluster. If you added mssql cluster through OpenMetadata UI, make sure the service name matches the same. 4. **host\_port** - Hostname and Port number where the service is being initialised. -5. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata. -6. **database** - \_\*\*\_Database name from where data is to be fetched from. +5. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata +6. **database** - Database name from where data is to be fetched from. ## Publish to OpenMetadata Below is the configuration to publish mssql data into openmetadata -Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config +Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config {% code title="mssql.json" %} ```javascript diff --git a/docs/install/metadata-ingestion/connectors/mysql.md b/docs/install/metadata-ingestion/connectors/mysql.md index e78aedba7e0..e14b724002b 100644 --- a/docs/install/metadata-ingestion/connectors/mysql.md +++ b/docs/install/metadata-ingestion/connectors/mysql.md @@ -4,28 +4,21 @@ description: This guide will help install MySQL connector and run manually # MySQL -## MySQL - {% hint style="info" %} **Prerequisites** OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above -2. Create and activate python env - - ```bash - python3 -m venv env - source env/bin/activate - ``` {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[mysql]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -58,7 +51,7 @@ metadata ingest -c ./pipelines/mysql.json "username": "openmetadata_user", "password": "openmetadata_password", "service_name": "local_mysql", - "include_pattern": { + "filter_pattern": { "deny": ["mysql.*", "information_schema.*"] } } @@ -70,13 +63,13 @@ metadata ingest -c ./pipelines/mysql.json 1. **username** - pass the MySQL username. We recommend creating a user with read-only permissions to all the databases in your MySQL installation 2. **password** - password for the username 3. **service\_name** - Service Name for this MySQL cluster. If you added MySQL cluster through OpenMetadata UI, make sure the service name matches the same. -4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata ## Publish to OpenMetadata Below is the configuration to publish MySQL data into openmetadata -Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config +Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config {% code title="mysql.json" %} ```javascript @@ -88,7 +81,7 @@ Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `me "password": "openmetadata_password", "service_name": "local_mysql", "service_type": "MySQL", - "include_pattern": { + "filter_pattern": { "excludes": ["mysql.*", "information_schema.*"] } } diff --git a/docs/install/metadata-ingestion/connectors/oracle.md b/docs/install/metadata-ingestion/connectors/oracle.md index b2c1279843b..14da9099e4c 100644 --- a/docs/install/metadata-ingestion/connectors/oracle.md +++ b/docs/install/metadata-ingestion/connectors/oracle.md @@ -5,17 +5,20 @@ description: This guide will help install Oracle connector and run manually # Oracle {% hint style="info" %} +**Prerequisites** + OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[oracle]' +python -m spacy download en_core_web_sm ``` {% endtab %} diff --git a/docs/install/metadata-ingestion/connectors/postgres.md b/docs/install/metadata-ingestion/connectors/postgres.md index 91cd4207359..1f40202c38d 100644 --- a/docs/install/metadata-ingestion/connectors/postgres.md +++ b/docs/install/metadata-ingestion/connectors/postgres.md @@ -4,8 +4,6 @@ description: This guide will help install Postgres connector and run manually # Postgres -## Postgres - {% hint style="info" %} **Prerequisites** @@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[postgres]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/postgres.json "database": "pagila", "service_name": "local_postgres", "service_type": "POSTGRES", - "include_pattern": { - "deny": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] } + "filter_pattern": { + "excludes": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"] } } }, ... @@ -66,21 +65,16 @@ metadata ingest -c ./pipelines/postgres.json 1. **username** - pass the Postgres username. 2. **password** - password for the Postgres username. 3. **service\_name** - Service Name for this Postgres cluster. If you added the Postgres cluster through OpenMetadata UI, make sure the service name matches the same. -4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata. 5. **database -** Database name from where data is to be fetched. ### Publish to OpenMetadata -Below is the configuration to publish postgres data into openmetadata +Below is the configuration to publish Postgres data into openmetadata -Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config +Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config {% code title="postgres.json" %} -```text - -``` -{% endcode %} - ```javascript { "source": { @@ -118,4 +112,5 @@ Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `me } } ``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/redshift-usage.md b/docs/install/metadata-ingestion/connectors/redshift-usage.md index 350c12e20bc..341367bacf6 100644 --- a/docs/install/metadata-ingestion/connectors/redshift-usage.md +++ b/docs/install/metadata-ingestion/connectors/redshift-usage.md @@ -4,8 +4,6 @@ description: This guide will help install Redshift Usage connector and run manua # Redshift Usage -## Redshift Usage - {% hint style="info" %} **Prerequisites** @@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} @@ -41,5 +39,89 @@ pip install '.[redshift-usage]' metadata ingest -c ./pipelines/redshift_usage.json ``` -## Configuration +### Configuration + +{% code title="redshift\_usage.json" %} +```javascript +{ + "source": { + "type": "redshift-usage", + "config": { + "host_port": "cluster.user.region.redshift.amazonaws.com:5439", + "username": "username", + "password": "password", + "database": "warehouse", + "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'", + "service_name": "aws_redshift", + "service_type": "Redshift", + "duration": 2 + } + }, + ... +``` +{% endcode %} + +1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation +2. **password** - password for the username +3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata + +## Publish to OpenMetadata + +Below is the configuration to publish Redshift Usage data into openmetadata + +Add optional `query-parser` processor, `table-usage` stage and `metadata-usage` bulk\_sink along with `metadata-server` config + +{% code title="redshift\_usage.json" %} +```javascript +{ + "source": { + "type": "redshift-usage", + "config": { + "host_port": "cluster.user.region.redshift.amazonaws.com:5439", + "username": "username", + "password": "password", + "database": "warehouse", + "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'", + "service_name": "aws_redshift", + "service_type": "Redshift", + "duration": 2 + } + }, + "processor": { + "type": "query-parser", + "config": { + "filter": "" + } + }, + "stage": { + "type": "table-usage", + "config": { + "filename": "/tmp/redshift_usage" + } + }, + "bulk_sink": { + "type": "metadata-usage", + "config": { + "filename": "/tmp/redshift_usage" + } + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/redshift.md b/docs/install/metadata-ingestion/connectors/redshift.md index ce733ff0dc5..ce901bb7243 100644 --- a/docs/install/metadata-ingestion/connectors/redshift.md +++ b/docs/install/metadata-ingestion/connectors/redshift.md @@ -4,8 +4,6 @@ description: This guide will help install Redshift connector and run manually # Redshift -## Redshift - {% hint style="info" %} **Prerequisites** @@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[redshift]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -41,5 +40,75 @@ pip install '.[redshift]' metadata ingest -c ./pipelines/redshift.json ``` -## Configuration +### Configuration + +{% code title="redshift.json" %} +```javascript +{ + "source": { + "type": "redshift", + "config": { + "host_port": "cluster.user.region.redshift.amazonaws.com:5439", + "username": "username", + "password": "password", + "database": "warehouse", + "service_name": "aws_redshift", + "service_type": "Redshift" + } + }, + ... +``` +{% endcode %} + +1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation +2. **password** - password for the username +3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata + +## Publish to OpenMetadata + +Below is the configuration to publish Redshift data into openmetadata + +Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config + +{% code title="redshift.json" %} +```javascript +{ + "source": { + "type": "redshift", + "config": { + "host_port": "cluster.user.region.redshift.amazonaws.com:5439", + "username": "username", + "password": "password", + "database": "warehouse", + "service_name": "aws_redshift", + "service_type": "Redshift" + } + }, + "processor": { + "type": "pii", + "config": {} + }, + "sink": { + "type": "metadata-rest-tables", + "config": {} + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/snowflake-usage.md b/docs/install/metadata-ingestion/connectors/snowflake-usage.md index 19237b99e4c..36639d95750 100644 --- a/docs/install/metadata-ingestion/connectors/snowflake-usage.md +++ b/docs/install/metadata-ingestion/connectors/snowflake-usage.md @@ -4,8 +4,6 @@ description: This guide will help install Snowflake Usage connector and run manu # Snowflake Usage -## Snowflake Usage - {% hint style="info" %} **Prerequisites** @@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} @@ -41,5 +39,89 @@ pip install '.[snowflake-usage]' metadata ingest -c ./pipelines/snowflake_usage.json ``` -## Configuration +### Configuration + +{% code title="snowflake\_usage.json" %} +```javascript +{ + "source": { + "type": "snowflake-usage", + "config": { + "host_port": "account.region.service.snowflakecomputing.com", + "username": "username", + "password": "strong_password", + "database": "SNOWFLAKE_SAMPLE_DATA", + "account": "account_name", + "service_name": "snowflake", + "service_type": "Snowflake", + "duration": 2 + } + }, +``` +{% endcode %} + +1. **username** - pass the Snowflake username. +2. **password** - password for the Snowflake username. +3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata. +5. **database -** Database name from where data is to be fetched. + +### Publish to OpenMetadata + +Below is the configuration to publish Snowflake Usage data into openmetadata + +Add Optional `query-parser` processor, `table-usage` stage and`metadata-usage` bulk\_sink along with `metadata-server` config + +{% code title="snowflake\_usage.json" %} +```javascript +{ + "source": { + "type": "snowflake-usage", + "config": { + "host_port": "account.region.service.snowflakecomputing.com", + "username": "username", + "password": "strong_password", + "database": "SNOWFLAKE_SAMPLE_DATA", + "account": "account_name", + "service_name": "snowflake", + "service_type": "Snowflake", + "duration": 2 + } + }, + "processor": { + "type": "query-parser", + "config": { + "filter": "" + } + }, + "stage": { + "type": "table-usage", + "config": { + "filename": "/tmp/snowflake_usage" + } + }, + "bulk_sink": { + "type": "metadata-usage", + "config": { + "filename": "/tmp/snowflake_usage" + } + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/connectors/snowflake.md b/docs/install/metadata-ingestion/connectors/snowflake.md index 8ac07ec50c9..18a0dc10b23 100644 --- a/docs/install/metadata-ingestion/connectors/snowflake.md +++ b/docs/install/metadata-ingestion/connectors/snowflake.md @@ -4,8 +4,6 @@ description: This guide will help install Snowflake connector and run manually # Snowflake -## Snowflake - {% hint style="info" %} **Prerequisites** @@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above {% endhint %} -## Install from PyPI or Source +### Install from PyPI or Source {% tabs %} {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[snowflake]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -41,5 +40,91 @@ pip install '.[snowflake]' metadata ingest -c ./pipelines/snowflake.json ``` -## Configuration +### Configuration + +{% code title="snowflake.json" %} +```javascript +{ + "source": { + "type": "snowflake", + "config": { + "host_port": "account.region.service.snowflakecomputing.com", + "username": "username", + "password": "strong_password", + "database": "SNOWFLAKE_SAMPLE_DATA", + "account": "account_name", + "service_name": "snowflake", + "service_type": "Snowflake", + "filter_pattern": { + "includes": [ + "(\\w)*tpcds_sf100tcl", + "(\\w)*tpcds_sf100tcl", + "(\\w)*tpcds_sf10tcl" + ] + } + } + }, +``` +{% endcode %} + +1. **username** - pass the Snowflake username. +2. **password** - password for the Snowflake username. +3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same. +4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata. +5. **database -** Database name from where data is to be fetched. + +### Publish to OpenMetadata + +Below is the configuration to publish Snowflake data into openmetadata + +Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config + +{% code title="snowflake.json" %} +```javascript +{ + "source": { + "type": "snowflake", + "config": { + "host_port": "account.region.service.snowflakecomputing.com", + "username": "username", + "password": "strong_password", + "database": "SNOWFLAKE_SAMPLE_DATA", + "account": "account_name", + "service_name": "snowflake", + "service_type": "Snowflake", + "filter_pattern": { + "includes": [ + "(\\w)*tpcds_sf100tcl", + "(\\w)*tpcds_sf100tcl", + "(\\w)*tpcds_sf10tcl" + ] + } + } + }, + "processor": { + "type": "pii", + "config": {} + }, + "sink": { + "type": "metadata-rest-tables", + "config": {} + }, + "metadata_server": { + "type": "metadata-server", + "config": { + "api_endpoint": "http://localhost:8585/api", + "auth_provider_type": "no-auth" + } + }, + "cron": { + "minute": "*/5", + "hour": null, + "day": null, + "month": null, + "day_of_week": null + } +} + +``` +{% endcode %} diff --git a/docs/install/metadata-ingestion/ingest-sample-data.md b/docs/install/metadata-ingestion/ingest-sample-data.md index e215afe95c6..933c2f83f40 100644 --- a/docs/install/metadata-ingestion/ingest-sample-data.md +++ b/docs/install/metadata-ingestion/ingest-sample-data.md @@ -12,11 +12,6 @@ description: This guide will help you to ingest sample data OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. 1. Python 3.7 or above -2. Create and activate python env - - ```bash - - ``` {% endhint %} ### Install from PyPI or Source @@ -25,6 +20,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL. {% tab title="Install Using PyPI" %} ```bash pip install 'openmetadata-ingestion[sample-tables, elasticsearch]' +python -m spacy download en_core_web_sm ``` {% endtab %} @@ -40,10 +36,11 @@ pip install '.[sample-tables, elasticsearch]' {% endtab %} {% endtabs %} -### Ingest sample tables and users +### Ingest sample tables, usage and users ```bash metadata ingest -c ./pipelines/sample_tables.json +metadata ingest -c ./pipelines/sample_usage.json metadata ingest -c ./pipelines/sample_users.json ```