GitBook: [main] 67 pages and 10 assets modified

2025-12-27 23:48:19 +00:00 · 2021-08-14 20:13:00 +00:00 · 2021-08-14 20:13:00 +00:00 · fa7ae17aaa
commit fa7ae17aaa
parent dd0a2052b7
19 changed files with 520 additions and 71 deletions
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
+++ b/docs/.gitbook/assets/screenshot-from-2021-07-26-21-08-17
--- a/docs/install/metadata-ingestion/airflow.md
+++ b/docs/install/metadata-ingestion/airflow.md
@ -1,12 +1,10 @@
 # Airflow

-We highly recommend using Airflow or similar schedulers to run Metadata Connectors.
-Below is the sample code example you can refer to integrate with Airflow
-
+We highly recommend using Airflow or similar schedulers to run Metadata Connectors. Below is the sample code example you can refer to integrate with Airflow

 ## Airflow Example for Hive

-```py
+```python
 from datetime import timedelta
 from airflow import DAG

@ -53,7 +51,7 @@ with DAG(

 we are using a python method like below

-```py
+```python
 def metadata_ingestion_workflow():
    config = load_config_file("examples/workflows/hive.json")
    workflow = Workflow.create(config)
@ -63,6 +61,5 @@ def metadata_ingestion_workflow():
    workflow.stop()
 ```

-Create a Worfklow instance and pass a hive configuration which will read metadata from Hive
-and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to [Metadata Connectors](
+Create a Worfklow instance and pass a hive configuration which will read metadata from Hive and ingest into OpenMetadata Server. You can customize this configuration or add different connectors please refer to our [examples](https://github.com/open-metadata/OpenMetadata/tree/main/ingestion/examples/workflows) and refer to \[Metadata Connectors\]\(

--- a/docs/install/metadata-ingestion/connectors/athena.md
+++ b/docs/install/metadata-ingestion/connectors/athena.md
@ -5,17 +5,20 @@ description: This guide will help install Athena connector and run manually
 # Athena

 {% hint style="info" %}
+**Prerequisites**
+
 OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.

 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[athena]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

--- a/docs/install/metadata-ingestion/connectors/bigquery.md
+++ b/docs/install/metadata-ingestion/connectors/bigquery.md
@ -4,8 +4,6 @@ description: This guide will help install BigQuery connector and run manually

 # BigQuery

-## BigQuery
-
 {% hint style="info" %}
 **Prerequisites**

@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[bigquery]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -38,9 +37,99 @@ pip install '.[bigquery]'
 ## Run Manually

 ```bash
-export GOOGLE_APPLICATION_CREDENTIALS="$PWD/pipelines/creds/bigquery-cred.json"
+export GOOGLE_APPLICATION_CREDENTIALS="$PWD/examples/creds/bigquery-cred.json"
 metadata ingest -c ./pipelines/bigquery.json
 ```

-## Configuration
+### Configuration
+
+{% code title="bigquery-creds.json \(boilerplate\)" %}
+```javascript
+{
+  "type": "service_account",
+  "project_id": "project_id",
+  "private_key_id": "private_key_id",
+  "private_key": "",
+  "client_email": "gcpuser@project_id.iam.gserviceaccount.com",
+  "client_id": "",
+  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
+  "token_uri": "https://oauth2.googleapis.com/token",
+  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
+  "client_x509_cert_url": ""
+}
+
+```
+{% endcode %}
+
+{% code title="bigquery.json" %}
+```javascript
+{
+  "source": {
+    "type": "bigquery",
+    "config": {
+      "project_id": "project-id",
+      "username": "username",
+      "host_port": "https://bigquery.googleapis.com",
+      "service_name": "gcp_bigquery",
+      "service_type": "BigQuery"
+    }
+  },
+```
+{% endcode %}
+
+1. **username** - pass the Bigquery username.
+2. **password** - password for the Bigquery username.
+3. **service\_name** - Service Name for this Bigquery cluster. If you added the Bigquery cluster through OpenMetadata UI, make sure the service name matches the same.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
+5. **database -** Database name from where data is to be fetched.
+
+### Publish to OpenMetadata
+
+Below is the configuration to publish Bigquery data into openmetadata
+
+Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
+
+{% code title="bigquery.json" %}
+```javascript
+{
+  "source": {
+    "type": "bigquery",
+    "config": {
+      "project_id": "project-id",
+      "username": "username",
+      "host_port": "https://bigquery.googleapis.com",
+      "service_name": "gcp_bigquery",
+      "service_type": "BigQuery"
+    }
+  },
+  "processor": {
+    "type": "pii",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api"
+    }
+  },
+  "sink": {
+    "type": "metadata-rest-tables",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api"
+    }
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/elastic-search.md
+++ b/docs/install/metadata-ingestion/connectors/elastic-search.md
@ -4,8 +4,6 @@ description: This guide will help install ElasticSearch connector and run manual

 # ElasticSearch

-## ElasticSearch
-
 {% hint style="info" %}
 **Prerequisites**

@ -41,5 +39,62 @@ pip install '.[elasticsearch]'
 metadata ingest -c ./pipelines/metadata_to_es.json
 ```

-## Configuration
+### Configuration
+
+{% code title="metadata\_to\_es.json" %}
+```javascript
+{
+  "source": {
+    "type": "metadata_es",
+    "config": {}
+  },
+...
+```
+{% endcode %}
+
+### Publish to OpenMetadata
+
+Below is the configuration to publish Elastic Search data into openmetadata
+
+Add Optional `file` stage and `elasticsearch` bulk\_sink along with `metadata-server` config
+
+{% code title="metadata\_to\_es.json" %}
+```javascript
+{
+  "source": {
+    "type": "metadata_es",
+    "config": {}
+  },
+  "stage": {
+    "type": "file",
+    "config": {
+      "filename": "/tmp/tables.txt"
+    }
+  },
+  "bulk_sink": {
+    "type": "elasticsearch",
+    "config": {
+      "filename": "/tmp/tables.txt",
+      "es_host_port": "localhost",
+      "index_name": "table_search_index"
+    }
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/mssql.md
+++ b/docs/install/metadata-ingestion/connectors/mssql.md
@ -4,8 +4,6 @@ description: This guide will help install MsSQL connector and run manually

 # MSSQL

-## MSSQL
-
 {% hint style="info" %}
 **Prerequisites**

@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[mssql]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -41,7 +40,7 @@ pip install '.[mssql]'
 metadata ingest -c ./pipelines/mssql.json
 ```

-## Configuration
+### Configuration

 {% code title="mssql.json" %}
 ```javascript
@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/mssql.json
      "database":"catalog_test",
      "username": "sa",
      "password": "test!Password",
-      "include_pattern": {
-        "allow": ["catalog_test.*"]
+      "filter_pattern": {
+        "includes": ["catalog_test.*"]
      }
    }
  },
@ -68,14 +67,14 @@ metadata ingest -c ./pipelines/mssql.json
 2. **password** - password for the mssql username.
 3. **service\_name** - Service Name for this mssql cluster. If you added mssql cluster through OpenMetadata UI, make sure the service name matches the same.
 4. **host\_port** - Hostname and Port number where the service is being initialised.
-5. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
-6. **database** - \_\*\*\_Database name from where data is to be fetched from.
+5. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
+6. **database** - Database name from where data is to be fetched from.

 ## Publish to OpenMetadata

 Below is the configuration to publish mssql data into openmetadata

-Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
+Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config

 {% code title="mssql.json" %}
 ```javascript
--- a/docs/install/metadata-ingestion/connectors/mysql.md
+++ b/docs/install/metadata-ingestion/connectors/mysql.md
@ -4,28 +4,21 @@ description: This guide will help install MySQL connector and run manually

 # MySQL

-## MySQL
-
 {% hint style="info" %}
 **Prerequisites**

 OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.

 1. Python 3.7 or above
-2. Create and activate python env
-
-   ```bash
-   python3 -m venv env
-   source env/bin/activate
-   ```
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[mysql]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -58,7 +51,7 @@ metadata ingest -c ./pipelines/mysql.json
      "username": "openmetadata_user",
      "password": "openmetadata_password",
      "service_name": "local_mysql",
-      "include_pattern": {
+      "filter_pattern": {
        "deny": ["mysql.*", "information_schema.*"]
      }
    }
@ -70,13 +63,13 @@ metadata ingest -c ./pipelines/mysql.json
 1. **username** - pass the MySQL username. We recommend creating a user with read-only permissions to all the databases in your MySQL installation
 2. **password** - password for the username
 3. **service\_name** - Service Name for this MySQL cluster. If you added MySQL cluster through OpenMetadata UI, make sure the service name matches the same.
-4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata

 ## Publish to OpenMetadata

 Below is the configuration to publish MySQL data into openmetadata

-Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
+Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config

 {% code title="mysql.json" %}
 ```javascript
@ -88,7 +81,7 @@ Add optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
      "password": "openmetadata_password",
      "service_name": "local_mysql",
      "service_type": "MySQL",
-      "include_pattern": {
+      "filter_pattern": {
        "excludes": ["mysql.*", "information_schema.*"]
      }
    }
--- a/docs/install/metadata-ingestion/connectors/oracle.md
+++ b/docs/install/metadata-ingestion/connectors/oracle.md
@ -5,17 +5,20 @@ description: This guide will help install Oracle connector and run manually
 # Oracle

 {% hint style="info" %}
+**Prerequisites**
+
 OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.

 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[oracle]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

--- a/docs/install/metadata-ingestion/connectors/postgres.md
+++ b/docs/install/metadata-ingestion/connectors/postgres.md
@ -4,8 +4,6 @@ description: This guide will help install Postgres connector and run manually

 # Postgres

-## Postgres
-
 {% hint style="info" %}
 **Prerequisites**

@ -20,6 +18,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[postgres]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -55,8 +54,8 @@ metadata ingest -c ./pipelines/postgres.json
      "database": "pagila",
      "service_name": "local_postgres",
      "service_type": "POSTGRES",
-      "include_pattern": {
-        "deny": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"]      }
+      "filter_pattern": {
+        "excludes": ["pg_openmetadata.*[a-zA-Z0-9]*","information_schema.*[a-zA-Z0-9]*"]      }
    }
  },
 ...
@ -66,21 +65,16 @@ metadata ingest -c ./pipelines/postgres.json
 1. **username** - pass the Postgres username.
 2. **password** - password for the Postgres username.
 3. **service\_name** - Service Name for this Postgres cluster. If you added the Postgres cluster through OpenMetadata UI, make sure the service name matches the same.
-4. **table\_pattern** - It contains allow, deny options to choose which pattern of datasets you want to ingest into OpenMetadata.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
 5. **database -** Database name from where data is to be fetched.

 ### Publish to OpenMetadata

-Below is the configuration to publish postgres data into openmetadata
+Below is the configuration to publish Postgres data into openmetadata

-Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `metadata-server` config
+Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config

 {% code title="postgres.json" %}
-```text
-
-```
-{% endcode %}
-
 ```javascript
 {
  "source": {
@ -118,4 +112,5 @@ Add Optional `pii-tags` processor and `metadata-rest-tables` sink along with `me
  }
 }
 ```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/redshift-usage.md
+++ b/docs/install/metadata-ingestion/connectors/redshift-usage.md
@ -4,8 +4,6 @@ description: This guide will help install Redshift Usage connector and run manua

 # Redshift Usage

-## Redshift Usage
-
 {% hint style="info" %}
 **Prerequisites**

@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
@ -41,5 +39,89 @@ pip install '.[redshift-usage]'
 metadata ingest -c ./pipelines/redshift_usage.json
 ```

-## Configuration
+### Configuration
+
+{% code title="redshift\_usage.json" %}
+```javascript
+{
+  "source": {
+    "type": "redshift-usage",
+    "config": {
+      "host_port": "cluster.user.region.redshift.amazonaws.com:5439",
+      "username": "username",
+      "password": "password",
+      "database": "warehouse",
+      "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
+      "service_name": "aws_redshift",
+      "service_type": "Redshift",
+      "duration": 2
+    }
+  },
+ ...
+```
+{% endcode %}
+
+1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
+2. **password** - password for the username
+3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
+
+## Publish to OpenMetadata
+
+Below is the configuration to publish Redshift Usage data into openmetadata
+
+Add optional `query-parser` processor, `table-usage` stage and `metadata-usage` bulk\_sink along with `metadata-server` config
+
+{% code title="redshift\_usage.json" %}
+```javascript
+{
+  "source": {
+    "type": "redshift-usage",
+    "config": {
+      "host_port": "cluster.user.region.redshift.amazonaws.com:5439",
+      "username": "username",
+      "password": "password",
+      "database": "warehouse",
+      "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
+      "service_name": "aws_redshift",
+      "service_type": "Redshift",
+      "duration": 2
+    }
+  },
+  "processor": {
+    "type": "query-parser",
+    "config": {
+      "filter": ""
+    }
+  },
+  "stage": {
+    "type": "table-usage",
+    "config": {
+      "filename": "/tmp/redshift_usage"
+    }
+  },
+  "bulk_sink": {
+    "type": "metadata-usage",
+    "config": {
+      "filename": "/tmp/redshift_usage"
+    }
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/redshift.md
+++ b/docs/install/metadata-ingestion/connectors/redshift.md
@ -4,8 +4,6 @@ description: This guide will help install Redshift connector and run manually

 # Redshift

-## Redshift
-
 {% hint style="info" %}
 **Prerequisites**

@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[redshift]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -41,5 +40,75 @@ pip install '.[redshift]'
 metadata ingest -c ./pipelines/redshift.json
 ```

-## Configuration
+### Configuration
+
+{% code title="redshift.json" %}
+```javascript
+{
+  "source": {
+    "type": "redshift",
+    "config": {
+      "host_port": "cluster.user.region.redshift.amazonaws.com:5439",
+      "username": "username",
+      "password": "password",
+      "database": "warehouse",
+      "service_name": "aws_redshift",
+      "service_type": "Redshift"
+    }
+  },
+ ...
+```
+{% endcode %}
+
+1. **username** - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
+2. **password** - password for the username
+3. **service\_name** - Service Name for this Redshift cluster. If you added Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata
+
+## Publish to OpenMetadata
+
+Below is the configuration to publish Redshift data into openmetadata
+
+Add optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
+
+{% code title="redshift.json" %}
+```javascript
+{
+  "source": {
+    "type": "redshift",
+    "config": {
+      "host_port": "cluster.user.region.redshift.amazonaws.com:5439",
+      "username": "username",
+      "password": "password",
+      "database": "warehouse",
+      "service_name": "aws_redshift",
+      "service_type": "Redshift"
+    }
+  },
+  "processor": {
+    "type": "pii",
+    "config": {}
+  },
+  "sink": {
+    "type": "metadata-rest-tables",
+    "config": {}
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/snowflake-usage.md
+++ b/docs/install/metadata-ingestion/connectors/snowflake-usage.md
@ -4,8 +4,6 @@ description: This guide will help install Snowflake Usage connector and run manu

 # Snowflake Usage

-## Snowflake Usage
-
 {% hint style="info" %}
 **Prerequisites**

@ -14,7 +12,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
@ -41,5 +39,89 @@ pip install '.[snowflake-usage]'
 metadata ingest -c ./pipelines/snowflake_usage.json
 ```

-## Configuration
+### Configuration
+
+{% code title="snowflake\_usage.json" %}
+```javascript
+{
+  "source": {
+    "type": "snowflake-usage",
+    "config": {
+      "host_port": "account.region.service.snowflakecomputing.com",
+      "username": "username",
+      "password": "strong_password",
+      "database": "SNOWFLAKE_SAMPLE_DATA",
+      "account": "account_name",
+      "service_name": "snowflake",
+      "service_type": "Snowflake",
+      "duration": 2
+    }
+  },
+```
+{% endcode %}
+
+1. **username** - pass the Snowflake  username.
+2. **password** - password for the Snowflake username.
+3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
+5. **database -** Database name from where data is to be fetched.
+
+### Publish to OpenMetadata
+
+Below is the configuration to publish Snowflake Usage data into openmetadata
+
+Add Optional `query-parser` processor, `table-usage` stage  and`metadata-usage` bulk\_sink along with `metadata-server` config
+
+{% code title="snowflake\_usage.json" %}
+```javascript
+{
+  "source": {
+    "type": "snowflake-usage",
+    "config": {
+      "host_port": "account.region.service.snowflakecomputing.com",
+      "username": "username",
+      "password": "strong_password",
+      "database": "SNOWFLAKE_SAMPLE_DATA",
+      "account": "account_name",
+      "service_name": "snowflake",
+      "service_type": "Snowflake",
+      "duration": 2
+    }
+  },
+  "processor": {
+    "type": "query-parser",
+    "config": {
+      "filter": ""
+    }
+  },
+  "stage": {
+    "type": "table-usage",
+    "config": {
+      "filename": "/tmp/snowflake_usage"
+    }
+  },
+  "bulk_sink": {
+    "type": "metadata-usage",
+    "config": {
+      "filename": "/tmp/snowflake_usage"
+    }
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/connectors/snowflake.md
+++ b/docs/install/metadata-ingestion/connectors/snowflake.md
@ -4,8 +4,6 @@ description: This guide will help install Snowflake connector and run manually

 # Snowflake

-## Snowflake
-
 {% hint style="info" %}
 **Prerequisites**

@ -14,12 +12,13 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 1. Python 3.7 or above
 {% endhint %}

-## Install from PyPI or Source
+### Install from PyPI or Source

 {% tabs %}
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[snowflake]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -41,5 +40,91 @@ pip install '.[snowflake]'
 metadata ingest -c ./pipelines/snowflake.json
 ```

-## Configuration
+### Configuration
+
+{% code title="snowflake.json" %}
+```javascript
+{
+  "source": {
+    "type": "snowflake",
+    "config": {
+      "host_port": "account.region.service.snowflakecomputing.com",
+      "username": "username",
+      "password": "strong_password",
+      "database": "SNOWFLAKE_SAMPLE_DATA",
+      "account": "account_name",
+      "service_name": "snowflake",
+      "service_type": "Snowflake",
+      "filter_pattern": {
+        "includes": [
+          "(\\w)*tpcds_sf100tcl",
+          "(\\w)*tpcds_sf100tcl",
+          "(\\w)*tpcds_sf10tcl"
+        ]
+      }
+    }
+  },
+```
+{% endcode %}
+
+1. **username** - pass the Snowflake username.
+2. **password** - password for the Snowflake username.
+3. **service\_name** - Service Name for this Snowflake cluster. If you added the Snowflake cluster through OpenMetadata UI, make sure the service name matches the same.
+4. **filter\_pattern** - It contains includes, excludes options to choose which pattern of datasets you want to ingest into OpenMetadata.
+5. **database -** Database name from where data is to be fetched.
+
+### Publish to OpenMetadata
+
+Below is the configuration to publish Snowflake data into openmetadata
+
+Add Optional `pii` processor and `metadata-rest-tables` sink along with `metadata-server` config
+
+{% code title="snowflake.json" %}
+```javascript
+{
+  "source": {
+    "type": "snowflake",
+    "config": {
+      "host_port": "account.region.service.snowflakecomputing.com",
+      "username": "username",
+      "password": "strong_password",
+      "database": "SNOWFLAKE_SAMPLE_DATA",
+      "account": "account_name",
+      "service_name": "snowflake",
+      "service_type": "Snowflake",
+      "filter_pattern": {
+        "includes": [
+          "(\\w)*tpcds_sf100tcl",
+          "(\\w)*tpcds_sf100tcl",
+          "(\\w)*tpcds_sf10tcl"
+        ]
+      }
+    }
+  },
+  "processor": {
+    "type": "pii",
+    "config": {}
+  },
+  "sink": {
+    "type": "metadata-rest-tables",
+    "config": {}
+  },
+  "metadata_server": {
+    "type": "metadata-server",
+    "config": {
+      "api_endpoint": "http://localhost:8585/api",
+      "auth_provider_type": "no-auth"
+    }
+  },
+  "cron": {
+    "minute": "*/5",
+    "hour": null,
+    "day": null,
+    "month": null,
+    "day_of_week": null
+  }
+}
+
+```
+{% endcode %}

--- a/docs/install/metadata-ingestion/ingest-sample-data.md
+++ b/docs/install/metadata-ingestion/ingest-sample-data.md
@ -12,11 +12,6 @@ description: This guide will help you to ingest sample data
 OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.

 1. Python 3.7 or above
-2. Create and activate python env
-
-   ```bash
-
-   ```
 {% endhint %}

 ### Install from PyPI or Source
@ -25,6 +20,7 @@ OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
 {% tab title="Install Using PyPI" %}
 ```bash
 pip install 'openmetadata-ingestion[sample-tables, elasticsearch]'
+python -m spacy download en_core_web_sm
 ```
 {% endtab %}

@ -40,10 +36,11 @@ pip install '.[sample-tables, elasticsearch]'
 {% endtab %}
 {% endtabs %}

-### Ingest sample tables and users
+### Ingest sample tables, usage and users

 ```bash
 metadata ingest -c ./pipelines/sample_tables.json
+metadata ingest -c ./pipelines/sample_usage.json
 metadata ingest -c ./pipelines/sample_users.json
 ```