diff --git a/docs/.gitbook/assets/dbt-tab.png b/docs/.gitbook/assets/dbt-tab.png
new file mode 100644
index 00000000000..09dfda20801
Binary files /dev/null and b/docs/.gitbook/assets/dbt-tab.png differ
diff --git a/docs/.gitbook/assets/redshift-data.png b/docs/.gitbook/assets/redshift-data.png
new file mode 100644
index 00000000000..a6c8848da0a
Binary files /dev/null and b/docs/.gitbook/assets/redshift-data.png differ
diff --git a/docs/.gitbook/assets/sample-data.png b/docs/.gitbook/assets/sample-data.png
index d647120d7dc..a681f6ad8ba 100644
Binary files a/docs/.gitbook/assets/sample-data.png and b/docs/.gitbook/assets/sample-data.png differ
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index 3e5a14dfbd4..4c619177116 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -26,6 +26,7 @@
* [Presto](openmetadata/connectors/presto.md)
* [Redash](openmetadata/connectors/redash.md)
* [Redshift](openmetadata/connectors/redshift.md)
+ * [Redshift (Revised)](connectors/redshift-revised.md)
* [Redshift Usage](openmetadata/connectors/redshift-usage.md)
* [Salesforce](openmetadata/connectors/salesforce.md)
* [Snowflake](openmetadata/connectors/snowflake.md)
@@ -96,7 +97,8 @@
* [Run OpenMetadata](install/run-openmetadata.md)
* [Run in Production](install/run-in-production.md)
* [Run in Kubernetes](install/run-in-kubernetes.md)
-* [Configuration](install/configuration.md)
+* [Server Configuration](install/configuration.md)
+* [Connector Configuration](install/connector-configuration.md)
* [Enable Security](install/enable-security/README.md)
* [Google SSO](install/enable-security/google-sso/README.md)
* [Create Server Credentials](install/enable-security/google-sso/google-server-creds.md)
diff --git a/docs/connectors/redshift-revised.md b/docs/connectors/redshift-revised.md
new file mode 100644
index 00000000000..9e43825e500
--- /dev/null
+++ b/docs/connectors/redshift-revised.md
@@ -0,0 +1,310 @@
+---
+description: >-
+ This guide will help you install and configure the Redshift connector and run
+ metadata ingestion workflows manually.
+---
+
+# Redshift (Revised)
+
+## Requirements
+
+Using the OpenMetadata Redshift connector requires supporting services and software. Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing and configuring this connector.
+
+### OpenMetadata (version 0.7.0 or greater)
+
+To use this guide you must have a running deployment of OpenMetadata. OpenMetadata includes the following services.
+
+* The OpenMetadata server supporting the metadata APIs and user interface
+* Elasticsearch for metadata search and discovery
+* MySQL as the backing store for all metadata
+* Airflow for metadata ingestion workflows
+
+If you have not already deployed OpenMetadata, please follow the guide, [Run OpenMetadata](../install/run-openmetadata.md) to get up and running.
+
+### Python (version 3.8.0 or greater)
+
+To check what version of Python you have, please use the following command.
+
+```
+python3 --version
+```
+
+### PostgreSQL (version 14.1 or greater)
+
+To check what version of PostgreSQL you have, please use the following command.
+
+```
+postgres --version
+```
+
+## Procedure
+
+The following is an overview of the steps in this procedure. Please follow all steps relevant to your use case.
+
+1. [Prepare a Python virtual environment](redshift-revised.md#1.-prepare-a-python-virtual-environment)
+2. [Install the Python module for this connector](redshift-revised.md#install-from-pypi-or-source)
+3. [Create a configuration file using template JSON](redshift-revised.md#4.-create-a-configuration-file-using-the-json-template)
+4. [Configure service settings](redshift-revised.md#5.-configure-service-settings)
+5. [Enable / disable the data profiler](redshift-revised.md#6.-configure-data-profiler-settings-optional)
+6. [Install the data profiler Python module (optional)](redshift-revised.md#3.-install-the-data-profiler-python-module-optional)
+7. [Configure data filters (optional)](redshift-revised.md#7.-configure-data-filters-optional)
+8. [Configure sample data (optional)](redshift-revised.md#8.-configure-sample-data-optional)
+9. [Configure DBT (optional)](redshift-revised.md#9.-configure-dbt-optional)
+10. [Confirm sink settings](redshift-revised.md#10.-confirm-sink-settings)
+11. [Confirm metadata\_server settings](redshift-revised.md#11.-confirm-metadata\_server-settings)
+12. [Run Ingestion Workflow](redshift-revised.md#run-manually)
+
+### 1. Prepare a Python virtual environment
+
+In this step we will create a Python virtual environment. This will enable us to avoid conflicts with other Python installations and packages on your host system.
+
+In a later step you will install the Python module for this connector and its dependencies in this virtual environment.
+
+#### 1a. Create a directory for openmetadata
+
+Throughout the docs we use a consistent directory structure OpenMetadata server and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
+
+```
+mkdir openmetadata; cd openmetadata
+```
+
+#### 1b. Create a directory for this connector
+
+Run the following command to create a directory for this connector and change into that directory.
+
+```bash
+mkdir redshift; cd redshift
+```
+
+#### 1c. Create the virtual environment
+
+Run the following command to create a Python virtual environment called, `env`.
+
+```bash
+python3 -m venv redshift-env
+```
+
+#### 1d. Activate the virtual environment
+
+Run the following command to activate the virtual environment.
+
+```bash
+source redshift-env/bin/activate
+```
+
+Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named, `redshift-env`.
+
+#### 1e. Upgrade pip and setuptools to the latest versions
+
+Ensure you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
+
+```
+pip3 install --upgrade pip setuptools
+```
+
+### 2. Install the Python module for this connector
+
+With the virtual environment set up and activated as described in Step 1, run the following command to install the Python module for the Redshift connector.
+
+```bash
+pip3 install 'openmetadata-ingestion[redshift]'
+```
+
+### 3. Create a configuration file using template JSON
+
+Create a new file called `redshift.json` in the current directory. Note that the current directory should be the `openmetadata/redshift` directory you created in Step 1.
+
+Copy and paste the configuration template below into the `redshift.json` file you created.
+
+{% hint style="info" %}
+Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. In the steps below we describe how to customize the key-value pairs in the `source.config` field to meet your needs.
+{% endhint %}
+
+{% code title="redshift.json" %}
+```json
+{
+ "source": {
+ "type": "redshift",
+ "config": {
+ "host_port": "cluster.name.region.redshift.amazonaws.com:5439",
+ "username": "username",
+ "password": "strong_password",
+ "service_name": "aws_redshift",
+ "data_profiler_enabled": "false",
+ "table_filter_pattern": {
+ "excludes": ["[\\w]*event_vw.*"]
+ },
+ "schema_filter_pattern": {
+ "excludes": ["information_schema.*"]
+ }
+ }
+ },
+ "sink": {
+ "type": "metadata-rest",
+ "config": {}
+ },
+ "metadata_server": {
+ "type": "metadata-server",
+ "config": {
+ "api_endpoint": "http://localhost:8585/api",
+ "auth_provider_type": "no-auth"
+ }
+ }
+}
+```
+{% endcode %}
+
+### 4. Configure service settings
+
+In this step, we will configure the Redshift service settings required for this connector. We will set values for the following fields in the `source.config` object in `redshift.json`.
+
+* `host_port`
+* `username`
+* `password`
+* `service_name`
+* `database (optional)`
+
+You will need to review and modify the values for each of the fields above in your `redshift.json` file. To configure metadata ingestion from your Redshift service, edit your `redshift.json` file following the guidance in the [Connector Service Settings](../install/connector-configuration.md#service-settings) documentation.
+
+### 5. Enable the data profiler (optional)
+
+When enabled, the data profiler runs as part of metadata ingestion. Data profiling increases the amount of time metadata ingestion requires, but enables you to assess the frequency of use, reliability, and other details for data.
+
+**We have disabled the profiler in the configuration template provided.** If you want to enable the data profiler, update your configuration file as follows.
+
+```json
+"data_profiler_enabled": "true"
+```
+
+See the [Connector Data Profiler Settings](../install/connector-configuration.md#data-profiler-settings) documentation for more information.
+
+### 6. Install the data profiler Python module (optional)
+
+If you enabled the data profiler in Step 5, run the following command to install the Python module for the data profiler. You will need this to run the ingestion workflow.
+
+```bash
+pip3 install 'openmetadata-ingestion[data-profiler]'
+```
+
+{% hint style="info" %}
+Installation for the data profiler takes several minutes to complete. While the installation process runs, continue through the remaining steps in this guide.
+{% endhint %}
+
+### 7. Configure data filters (optional)
+
+You may configure your connector to include or exclude views, tables, and databases or schemas using the filtering options below.
+
+* `include_views` - Include or exclude all views
+* `include_tables` - Include or exclude all tables
+* `table_filter_pattern` - Include or exclude tables by name using regular expressions
+* `schema_filter_pattern` - Include or exclude schemas by name using regular expressions
+
+By default, your connector will include all tables and views in metadata ingestion. **You only need to use the settings above if you want to limit the data considered during metadata ingestion.**
+
+To configure data filter settings for your Redshift service, edit the `redshift.json` file following the guidance in the [Connector Data Filter Settings](../install/connector-configuration.md#data-filter-settings) documentation.
+
+### 8. Configure sample data (optional)
+
+By default, your connector will ingest sample data from each table and make it available in the OpenMetadata user interface. The default settings for sample data work well for most use cases. If you would like to disable sample data ingestion or configure how sample data is selected, please edit the `redshift.json` file following the guidance in the [Connector Sample Data Settings](../install/connector-configuration.md#sample-data-settings) documentation.
+
+### 9. Configure DBT (optional)
+
+DBT provides transformation logic that creates tables and views from raw data. OpenMetadata includes an integration for DBT that enables users to see the models used to generate a table from that table's details page in the OpenMetadata user interface.
+
+To include DBT models and metadata in your ingestion workflows, edit your `redshift.json` file to point to your DBT catalog and manifest files following the guidance in the [Connector DBT Settings](../install/connector-configuration.md#dbt-settings) documentation .
+
+### 10. Confirm sink settings
+
+You should not need to make any changes to the fields defined for `sink` in the template code you copied into `redshift.json` in Step 4. This part of your configuration file should be as follows.
+
+```json
+"sink": {
+ "type": "metadata-rest",
+ "config": {}
+},
+```
+
+### 11. Confirm metadata\_server settings
+
+You should not need to make any changes to the fields defined for `metadata_server` in the template code you copied into `redshift.json` in Step 4. This part of your configuration file should be as follows.
+
+```json
+"metadata_server": {
+ "type": "metadata-server",
+ "config": {
+ "api_endpoint": "http://localhost:8585/api",
+ "auth_provider_type": "no-auth"
+ }
+}
+```
+
+### 12. Run ingestion workflow
+
+Your `redshift.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
+
+To run an ingestion workflow, execute the following command from the `openmetadata/redshift` directory you created in Step 1.
+
+```bash
+metadata ingest -c ./redshift.json
+```
+
+## Next Steps
+
+As the ingestion workflow runs, you may observe progress both from the command line and from the OpenMetadata user interface. To view the metadata ingested from Redshift, visit [http://localhost:8585/explore/tables](http://localhost:8585/explore/tables). Select the Redshift service to filter for the data you have ingested using the workflow you configured and ran following this guide. See the figure below for an example.
+
+
+
+## Troubleshooting
+
+### Error: pg\_config executable not found
+
+When attempting to install the `openmetadata-ingestion[redshift]` Python package in Step 2, you might encounter the following error.
+
+```
+pg_config is required to build psycopg2 from source. Please add the directory
+containing pg_config to the $PATH or specify the full executable path with the
+option:
+
+ python setup.py build_ext --pg-config /path/to/pg_config build ...
+
+or with the pg_config option in 'setup.cfg'.
+
+If you prefer to avoid building psycopg2 from source, please install the PyPI
+'psycopg2-binary' package instead.
+```
+
+The psycopg2 package is a dependency for the `openmetadata-ingestion[redshift]` Python package. To correct this problem, please install PostgreSQL on your host system.
+
+Then re-run the install command in [Step 2](redshift-revised.md#install-from-pypi-or-source).
+
+### ERROR: Failed building wheel for cryptography
+
+When attempting to install the `openmetadata-ingestion[redshift]` Python package in Step 2, you might encounter the following error. The error might also include mention of a Rust compiler.
+
+```
+Failed to build cryptography
+ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
+```
+
+This problem is usually due to running on older version of pip. Try upgrading pip as follows.
+
+```bash
+pip3 install --upgrade pip
+```
+
+Then re-run the install command in [Step 2](redshift-revised.md#install-from-pypi-or-source).
+
+### requests.exceptions.ConnectionError
+
+If you encounter the following error when attempting to run the ingestion workflow in Step 12, this is probably because there is no OpenMetadata server running at http://localhost:8585.
+
+```
+requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
+Max retries exceeded with url: /api/v1/services/databaseServices/name/aws_redshift
+(Caused by NewConnectionError(':
+Failed to establish a new connection: [Errno 61] Connection refused'))
+```
+
+To correct this problem, please follow the steps in the [Run OpenMetadata](../install/run-openmetadata.md) guide to deploy OpenMetadata in Docker on your local machine.
+
+Then re-run the metadata ingestion workflow in [Step 12](redshift-revised.md#run-manually).
diff --git a/docs/install/connector-configuration.md b/docs/install/connector-configuration.md
new file mode 100644
index 00000000000..c22a3b29194
--- /dev/null
+++ b/docs/install/connector-configuration.md
@@ -0,0 +1,279 @@
+---
+description: This page provides details on shared configuration settings for connectors.
+---
+
+# Connector Configuration
+
+OpenMetadata connectors require a configuration file with a number of fields to specify settings for the service, data profiler, data filters, sample data, DBT, and security. See below for a simple example of a connector file.
+
+```json
+{
+ "source": {
+ "type": "redshift",
+ "config": {
+ "host_port": "cluster.name.region.redshift.amazonaws.com:5439",
+ "username": "username",
+ "password": "strong_password",
+ "service_name": "aws_redshift",
+ "data_profiler_enabled": "false",
+ "table_filter_pattern": {
+ "excludes": ["[\\w]*event_vw.*"]
+ },
+ "schema_filter_pattern": {
+ "excludes": ["information_schema.*"]
+ }
+ }
+ },
+ "sink": {
+ "type": "metadata-rest",
+ "config": {}
+ },
+ "metadata_server": {
+ "type": "metadata-server",
+ "config": {
+ "api_endpoint": "http://localhost:8585/api",
+ "auth_provider_type": "no-auth"
+ }
+ }
+}
+```
+
+In the sections below we describe all configuration fields and their settings.
+
+{% hint style="info" %}
+Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. We reference this field through the service connector documentation.
+{% endhint %}
+
+## Service Settings
+
+Use service settings to configure your connector to read from the desired service and, optionally, database.
+
+#### host\_port
+
+Use `source.config.host_port` to send the endpoint for your data service. Use the `host:port` format illustrated in the example below.
+
+```json
+"host_port": "cluster.name.region.redshift.amazonaws.com:5439"
+```
+
+Please ensure your service is reachable from the host you are using to run metadata ingestion.
+
+#### username
+
+Edit the value for `source.config.username` to identify your service user.
+
+```json
+"username": "username"
+```
+
+{% hint style="danger" %}
+Note: The user specified should be authorized to read all databases you want to include in the metadata ingestion workflow.
+{% endhint %}
+
+#### password
+
+Edit the value for `source.config.password` with the password for your service user.
+
+```json
+"password": "strong_password"
+```
+
+#### service\_name
+
+OpenMetadata uniquely identifies services by their `service_name`. Edit the value for `source.config.service_name` with a name that distinguishes this deployment from other services from which you ingest metadata.
+
+```json
+"service_name": "aws_redshift"
+```
+
+#### database (optional)
+
+If you want to limit metadata ingestion to a single database, include the `source.config.database` field in your configuration file. If this field is not included, the connector will ingest metadata from all databases the specified user is authorized to read.
+
+To specify a single database to ingest metadata from, provide the name of the database as the value for the `source.config.database` key as illustrated in the example below.
+
+```json
+"database": "warehouse"
+```
+
+If you want to ingest metadata from two or more databases in a services but not all databases, use the `schema_filter_pattern` described below to match databases by name using regular expressions or define different workflows using separate config files for each database.
+
+## Data Profiler Settings
+
+The data profiler ingests usage information for tables. This enables you to assess frequency of use, reliability, and other details.
+
+#### data\_profiler\_enabled
+
+When enabled, the data profiler will run as part of metadata ingestion. Running the data profiler increases the amount of time metadata ingestion requires, but provides the benefits described above.
+
+You may disable the data profiler by including the following field in the `source.config` object of your configuration file.
+
+```json
+"data_profiler_enabled": "false"
+```
+
+If you want to enable the data profiler, update your configuration file as follows.
+
+```json
+"data_profiler_enabled": "true"
+```
+
+{% hint style="info" %}
+Note: The data profiler is enabled by default if no setting is provided for `data_profiler_enabled`.
+{% endhint %}
+
+#### data\_profiler\_offset (optional)
+
+Use `source.config.data_profiler_offset` to specify the row offset at which the profiler should begin scanning each table. See below for an example.
+
+```
+"data_profiler_offset": "1000"
+```
+
+{% hint style="info" %}
+Note: The key source.config.data\_profiler\_offset value is set to "0" by default.
+{% endhint %}
+
+{% hint style="info" %}
+Note: The source.config.data\_profiler\_offset field will be removed in a future release of OpenMetadata.
+{% endhint %}
+
+#### data\_profiler\_limit (optional)
+
+Use `source.config.data_profiler_limit` to specify the row limit at which the profiler should conclude scanning each table. You may specify the profiler row limit by including a key-value pair such as the following in the source.config field of your configuration file.
+
+```
+"data_profiler_limit": "50000"
+```
+
+{% hint style="info" %}
+Note: The value for source.config.data\_profiler\_limit is set to 50000 by default.
+{% endhint %}
+
+{% hint style="info" %}
+Note: The source.config.data\_profiler\_limit field will be removed in a future release of OpenMetadata.
+{% endhint %}
+
+## Data Filter Settings
+
+#### include\_views (optional)
+
+Use `source.config.include_views` to control whether or not to include views as part of metadata ingestion and data profiling.
+
+Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"include_views": "true"
+```
+
+Exclude views as follows.
+
+```json
+"include_views": "false"
+```
+
+{% hint style="info" %}
+Note: `source.config.include_views` is set to `true` by default.
+{% endhint %}
+
+#### include\_tables (optional)
+
+Use `source.config.include_tables` to control whether or not to include tables as part of metadata ingestion and data profiling.
+
+Explicitly include tables by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"include_tables": "true"
+```
+
+Exclude tables as follows.
+
+```json
+"include_tables": "false"
+```
+
+{% hint style="info" %}
+Note: `source.config.include_tables` is set to `true` by default.
+{% endhint %}
+
+#### table\_filter\_pattern (optional)
+
+Use `source.config.table_filter_pattern` to select tables for metadata ingestion by name.
+
+Use `source.config.table_filter_pattern.excludes` to exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See below for an example. This example is also included in the configuration template provided.
+
+```json
+"table_filter_pattern": {
+ "excludes": ["information_schema.*", "[\\w]*event_vw.*"]
+}
+```
+
+Use `source.config.table_filter_pattern.includes` to include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See below for an example.
+
+```json
+"table_filter_pattern": {
+ "includes": ["corp.*", "dept.*"]
+}
+```
+
+See the documentation for the [Python re module](https://docs.python.org/3/library/re.html) for information on how to construct regular expressions.
+
+{% hint style="info" %}
+You may use either `excludes` or `includes` but not both in `table_filter_pattern.`
+{% endhint %}
+
+#### schema\_filter\_pattern (optional)
+
+Use `source.config.schema_filter_pattern.excludes` and `source.config.schema_filter_pattern.includes` field to select schemas for metadata ingestion by name. The configuration template provides an example.
+
+The syntax and semantics for `schema_filter_pattern` are the same as for [`table_filter_pattern`](connector-configuration.md#table\_filter\_pattern-optional). Please see that section for details on use.
+
+## Sample Data Settings
+
+#### generate\_sample\_data (optional)
+
+Use the `source.config.generate_sample_data` field to control whether or not to generate sample data to include in table views in the OpenMetadata user interface. See the figure below for an example.
+
+
+
+Explicitly include sample data by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"generate_sample_data": "true"
+```
+
+If set to true, the connector will collect the first 50 rows of data from each table included in ingestion and catalog that data as sample data to which users can refer in the OpenMetadata user interface.
+
+You can exclude collection of sample data by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"generate_sample_data": "false"
+```
+
+{% hint style="info" %}
+Note: `generate_sample_data` is set to `true` by default.
+{% endhint %}
+
+## DBT Settings
+
+DBT provides transformation logic that creates tables and views from raw data. OpenMetadata includes an integration for DBT that enables you to see the models used to generate a table from that table's details page in the OpenMetadata user interface. See the figure below for an example.
+
+
+
+To include DBT models and metadata in your ingestion workflows, specify the location of the DBT manifest and catalog files as fields in your configuration file.
+
+#### dbt\_manifest\_file (optional)
+
+Use the field `source.config.dbt_manifest_file` to specify the location of your DBT manifest file. See below for an example.
+
+```json
+"dbt_manifest_file": "./dbt/manifest.json"
+```
+
+#### dbt\_catalog\_file (optional)
+
+Use the field `source.config.dbt_catalog_file` to specify the location of your DBT catalog file. See below for an example.
+
+```json
+"dbt_catalog_file": "./dbt/catalog.json"
+```
diff --git a/docs/openmetadata/connectors/redshift.md b/docs/openmetadata/connectors/redshift.md
index 157ad726329..d4c02c486f7 100644
--- a/docs/openmetadata/connectors/redshift.md
+++ b/docs/openmetadata/connectors/redshift.md
@@ -1,35 +1,127 @@
---
-description: This guide will help install Redshift connector and run manually
+description: >-
+ This guide will help you install and configure the Redshift connector and run
+ metadata ingestion workflows manually.
---
# Redshift
+## Requirements
+
+Using the OpenMetadata Redshift connector requires supporting services and software. Please ensure your host system meets the requirements listed below. Then continue to the procedure for installing and configuring this connector.
+
+### OpenMetadata (version 0.7.0 or greater)
+
+To use this guide you must have a running deployment of OpenMetadata. OpenMetadata includes the following services.
+
+* The OpenMetadata server supporting the metadata APIs and user interface
+* Elasticsearch for metadata search and discovery
+* MySQL as the backing store for all metadata
+* Airflow for metadata ingestion workflows
+
+If you have not already deployed OpenMetadata, please follow the guide, [Run OpenMetadata](../../install/run-openmetadata.md) to get up and running.
+
+### Python (version 3.8.0 or greater)
+
+To check what version of Python you have, please use the following command.
+
+```
+python3 --version
+```
+
+### PostgreSQL (version 14.1 or greater)
+
+To check what version of PostgreSQL you have, please use the following command.
+
+```
+postgres --version
+```
+
+## Procedure
+
+The following is an overview of the steps in this procedure. Please follow all steps relevant to your use case.
+
+1. [Prepare a Python virtual environment](redshift.md#1.-prepare-a-python-virtual-environment)
+2. [Install the Python module for this connector](redshift.md#install-from-pypi-or-source)
+3. [Create a configuration file using the Redshift JSON template](redshift.md#4.-create-a-configuration-file-using-the-json-template)
+4. [Configure service settings](redshift.md#5.-configure-service-settings)
+5. [Enable / disable the data profiler](redshift.md#6.-configure-data-profiler-settings-optional)
+6. [Install the data profiler Python module (optional)](redshift.md#3.-install-the-data-profiler-python-module-optional)
+7. [Configure data filters (optional)](redshift.md#7.-configure-data-filters-optional)
+8. [Configure sample data (optional)](redshift.md#8.-configure-sample-data-optional)
+9. [Configure DBT (optional)](redshift.md#9.-configure-dbt-optional)
+10. [Confirm sink settings](redshift.md#10.-confirm-sink-settings)
+11. [Confirm metadata\_server settings](redshift.md#11.-confirm-metadata\_server-settings)
+12. [Run Ingestion Workflow](redshift.md#run-manually)
+
+### 1. Prepare a Python virtual environment
+
+In this step we will create a Python virtual environment. Using a virtual environment will enable us to avoid conflicts with other Python installations and packages on your host system.
+
+In a later step you will install the Python module for this connector and its dependencies in this virtual environment.
+
+#### 1a. Create a directory for openmetadata
+
+Throughout the docs we use a consistent directory structure OpenMetadata server and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
+
+```
+mkdir openmetadata; cd openmetadata
+```
+
+#### 1b. Create a directory for this connector
+
+Run the following command to create a directory for this connector and change into that directory.
+
+```bash
+mkdir redshift; cd redshift
+```
+
+#### 1c. Create the virtual environment
+
+Run the following command to create a Python virtual environment called, `env`.
+
+```bash
+python3 -m venv redshift-env
+```
+
+#### 1d. Activate the virtual environment
+
+Run the following command to activate the virtual environment.
+
+```bash
+source redshift-env/bin/activate
+```
+
+Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named, `redshift-env`.
+
+#### 1e. Upgrade pip and setuptools to the latest versions
+
+Ensure you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
+
+```
+pip3 install --upgrade pip setuptools
+```
+
+### 2. Install the Python module for this connector
+
+With the virtual environment set up and activated as described in Step 1, run the following command to install the Python module for the Redshift connector.
+
+```bash
+pip3 install 'openmetadata-ingestion[redshift]'
+```
+
+### 3. Create a configuration file using the Redshift JSON template
+
+Create a new file called `redshift.json` in the current directory. Note that the current directory should be the `openmetadata/redshift` directory you created in Step 1.
+
+Copy and paste the configuration template below into the `redshift.json` file you created.
+
{% hint style="info" %}
-**Prerequisites**
-
-OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.
-
-1. Python 3.7 or above
-2. OpenMetadata Server up and running
-
+Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. In the steps below we describe how to customize the key-value pairs in the `source.config` field to meet your needs.
{% endhint %}
-## Install from PyPI
-
-```bash
-pip install 'openmetadata-ingestion[redshift]'
-```
-
-## Run Manually
-
-```bash
-metadata ingest -c ./examples/workflows/redshift.json
-```
-
-## Configuration
-
{% code title="redshift.json" %}
-```javascript
+```json
{
"source": {
"type": "redshift",
@@ -37,59 +129,16 @@ metadata ingest -c ./examples/workflows/redshift.json
"host_port": "cluster.name.region.redshift.amazonaws.com:5439",
"username": "username",
"password": "strong_password",
- "database": "warehouse",
"service_name": "aws_redshift",
- "data_profiler_enabled": "true",
- "data_profiler_offset": "0",
- "data_profiler_limit": "50000",
+ "data_profiler_enabled": "false",
"table_filter_pattern": {
- "excludes": ["demo.*","orders.*"]
+ "excludes": ["[\\w]*event_vw.*"]
},
"schema_filter_pattern": {
"excludes": ["information_schema.*"]
}
- },
-...
-```
-{% endcode %}
-
-1. **username** - pass the Redshift username.
-2. **password** - the password for the Redshift username.
-3. **service\_name** - Service Name for this Redshift cluster. If you added the Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
-4. **schema\_filter\_pattern** - It contains includes, excludes options to choose which pattern of schemas you want to ingest into OpenMetadata.
-5. **table\_filter\_pattern** - It contains includes, excludes options to choose which pattern of tables you want to ingest into OpenMetadata.
-5. **database -** Database name from where data is to be fetched.
-6. **data\_profiler\_enabled** - Enable data-profiling (Optional). It will provide you the newly ingested data.
-7. **data\_profiler\_offset** - Specify offset.
-8. **data\_profiler\_limit** - Specify limit.
-
-## Publish to OpenMetadata
-
-Below is the configuration to publish Redshift data into the OpenMetadata service.
-
-Add `metadata-rest` sink along with `metadata-server` config
-
-{% code title="redshift.json" %}
-```javascript
-{
- "source": {
- "type": "redshift",
- "config": {
- "host_port": "cluster.name.region.redshift.amazonaws.com:5439",
- "username": "username",
- "password": "strong_password",
- "database": "warehouse",
- "service_name": "aws_redshift",
- "data_profiler_enabled": "true",
- "data_profiler_offset": "0",
- "data_profiler_limit": "50000",
- "table_filter_pattern": {
- "excludes": ["demo.*","orders.*"]
- },
- "schema_filter_pattern": {
- "excludes": ["information_schema.*"]
- }
- },
+ }
+ },
"sink": {
"type": "metadata-rest",
"config": {}
@@ -104,3 +153,311 @@ Add `metadata-rest` sink along with `metadata-server` config
}
```
{% endcode %}
+
+### 4. Configure service settings
+
+In this step we will configure the Redshift service settings required for this connector. Please follow the instructions below to ensure you have configured the connector to read from your Redshift service as desired.
+
+#### host\_port
+
+Edit the value for `source.config.host_port` in `redshift.json` for your Redshift deployment. Use the `host:port` format illustrated in the example below.
+
+```json
+"host_port": "cluster.name.region.redshift.amazonaws.com:5439"
+```
+
+Please ensure your Redshift deployment is reachable from the host you are using to run metadata ingestion.
+
+#### username
+
+Edit the value for `source.config.username` to identify your Redshift user.
+
+```json
+"username": "username"
+```
+
+{% hint style="danger" %}
+Note: The user specified should be authorized to read all databases you want to include in the metadata ingestion workflow.
+{% endhint %}
+
+#### password
+
+Edit the value for `source.config.password` with the password for your Redshift user.
+
+```json
+"password": "strong_password"
+```
+
+#### service\_name
+
+OpenMetadata uniquely identifies services by their `service_name`. Edit the value for `source.config.service_name` with a name that distinguishes your this deployment from other services, including other Redshift services that you might be ingesting metadata from.
+
+```json
+"service_name": "aws_redshift"
+```
+
+#### database (optional)
+
+If you want to limit metadata ingestion to a single database, include the `source.config.database` field in your configuration file. If this field is not included, the Redshift connector will ingest metadata from all databases the specified user is authorized to read.
+
+To specify a single database to ingest metadata from, provide the name of the database as the value for the `source.config.database` key as illustrated in the example below.
+
+```json
+"database": "warehouse"
+```
+
+### 5. Enable / disable the data profiler
+
+The data profiler ingests usage information for tables. This enables you to assess frequency of use, reliability, and other details.
+
+#### data\_profiler\_enabled
+
+When enabled, the data profiler will run as part of metadata ingestion. Running the data profiler increases the amount of time metadata ingestion requires, but provides the benefits described above.
+
+You may disable the data profiler by setting the value for the key `source.config.data_profiler_enabled` to `"false"` as follows. We have done this in the configuration template provided.
+
+```json
+"data_profiler_enabled": "false"
+```
+
+If you want to enable the data profiler, update your configuration file as follows.
+
+```json
+"data_profiler_enabled": "true"
+```
+
+{% hint style="info" %}
+Note: The data profiler is enabled by default if no setting is provided for `data_profiler_enabled`.
+{% endhint %}
+
+### 6. Install the data profiler Python module (optional)
+
+If you enabled the data profiler in Step 5, run the following command to install the Python module for the data profiler. You will need this to run the ingestion workflow.
+
+```bash
+pip3 install 'openmetadata-ingestion[data-profiler]'
+```
+
+The data profiler module takes a few minutes to install. While it installs, continue through the remaining steps in this guide.
+
+### 7. Configure data filters (optional)
+
+#### include\_views (optional)
+
+Use `source.config.include_views` to control whether or not to include views as part of metadata ingestion and data profiling.
+
+Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"include_views": "true"
+```
+
+Exclude views as follows.
+
+```json
+"include_views": "false"
+```
+
+{% hint style="info" %}
+Note: `source.config.include_views` is set to `true` by default.
+{% endhint %}
+
+#### include\_tables (optional)
+
+Use `source.config.include_tables` to control whether or not to include tables as part of metadata ingestion and data profiling.
+
+
+
+Explicitly include tables by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"include_tables": "true"
+```
+
+Exclude tables as follows.
+
+```json
+"include_tables": "false"
+```
+
+{% hint style="info" %}
+Note: `source.config.include_tables` is set to `true` by default.
+{% endhint %}
+
+#### table\_filter\_pattern (optional)
+
+Use `source.config.table_filter_pattern` to select tables for metadata ingestion by name.
+
+Use `source.config.table_filter_pattern.excludes` to exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See below for an example. This example is also included in the configuration template provided.
+
+```json
+"table_filter_pattern": {
+ "excludes": ["information_schema.*", "[\\w]*event_vw.*"]
+}
+```
+
+Use `source.config.table_filter_pattern.includes` to include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See below for an example.
+
+```json
+"table_filter_pattern": {
+ "includes": ["corp.*", "dept.*"]
+}
+```
+
+See the documentation for the [Python re module](https://docs.python.org/3/library/re.html) for information on how to construct regular expressions.
+
+{% hint style="info" %}
+You may use either `excludes` or `includes` but not both in `table_filter_pattern.`
+{% endhint %}
+
+#### schema\_filter\_pattern (optional)
+
+Use `source.config.schema_filter_pattern.excludes` and `source.config.schema_filter_pattern.includes` field to select schemas for metadata ingestion by name. The configuration template provides an example.
+
+The syntax and semantics for `schema_filter_pattern` are the same as for [`table_filter_pattern`](redshift.md#table\_filter\_pattern-optional). Please see that section for details on use.
+
+### 8. Configure sample data (optional)
+
+#### generate\_sample\_data (optional)
+
+Use the `source.config.generate_sample_data` field to control whether or not to generate sample data to include in table views in the OpenMetadata user interface. See the figure below for an example.
+
+
+
+Explicitly include sample data by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"generate_sample_data": "true"
+```
+
+If set to true, the connector will collect the first 50 rows of data from each table included in ingestion and catalog that data as sample data to which users can refer in the OpenMetadata user interface.
+
+You can exclude collection of sample data by adding the following key-value pair in the `source.config` field of your configuration file.
+
+```json
+"generate_sample_data": "false"
+```
+
+{% hint style="info" %}
+Note: `generate_sample_data` is set to `true` by default.
+{% endhint %}
+
+### 9. Configure DBT (optional)
+
+DBT provides transformation logic that creates tables and views from raw data. OpenMetadata includes an integration for DBT that enables you to see the models used to generate a table from that table's details page in the OpenMetadata user interface. See the figure below for an example.
+
+
+
+To include DBT models and metadata in your ingestion workflows, specify the location of the DBT manifest and catalog files as fields in your configuration file.
+
+#### dbt\_manifest\_file (optional)
+
+Use the field `source.config.dbt_manifest_file` to specify the location of your DBT manifest file. See below for an example.
+
+```json
+"dbt_manifest_file": "./dbt/manifest.json"
+```
+
+#### dbt\_catalog\_file (optional)
+
+Use the field `source.config.dbt_catalog_file` to specify the location of your DBT catalog file. See below for an example.
+
+```json
+"dbt_catalog_file": "./dbt/catalog.json"
+```
+
+### 10. Confirm sink settings
+
+You should not need to make any changes to the fields defined for `sink` in the template code you copied into `redshift.json` in Step 4. This part of your configuration file should be as follows.
+
+```json
+"sink": {
+ "type": "metadata-rest",
+ "config": {}
+},
+```
+
+### 11. Confirm metadata\_server settings
+
+You should not need to make any changes to the fields defined for `metadata_server` in the template code you copied into `redshift.json` in Step 4. This part of your configuration file should be as follows.
+
+```json
+"metadata_server": {
+ "type": "metadata-server",
+ "config": {
+ "api_endpoint": "http://localhost:8585/api",
+ "auth_provider_type": "no-auth"
+ }
+}
+```
+
+### 12. Run Ingestion Workflow
+
+Your `redshift.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
+
+To run an ingestion workflow, execute the following command from the `openmetadata/redshift` directory you created in Step 1.
+
+```bash
+metadata ingest -c ./redshift.json
+```
+
+## Next Steps
+
+As the ingestion workflow runs, you may observe progress both from the command line and from the OpenMetadata user interface. To view the metadata ingested from Redshift, visit [http://localhost:8585/explore/tables](http://localhost:8585/explore/tables). Select the Redshift service to filter for the data you have ingested using the workflow you configured and ran following this guide. See the figure below for an example.
+
+
+
+## Troubleshooting
+
+### Error: pg\_config executable not found
+
+When attempting to install the `openmetadata-ingestion[redshift]` Python package in Step 2, you might encounter the following error.
+
+```
+pg_config is required to build psycopg2 from source. Please add the directory
+containing pg_config to the $PATH or specify the full executable path with the
+option:
+
+ python setup.py build_ext --pg-config /path/to/pg_config build ...
+
+or with the pg_config option in 'setup.cfg'.
+
+If you prefer to avoid building psycopg2 from source, please install the PyPI
+'psycopg2-binary' package instead.
+```
+
+The psycopg2 package is a dependency for the `openmetadata-ingestion[redshift]` Python package. To correct this problem, please install PostgreSQL on your host system.
+
+Then re-run the install command in [Step 2](redshift.md#install-from-pypi-or-source).
+
+### ERROR: Failed building wheel for cryptography
+
+When attempting to install the `openmetadata-ingestion[redshift]` Python package in Step 2, you might encounter the following error. The error might also include mention of a Rust compiler.
+
+```
+Failed to build cryptography
+ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
+```
+
+This problem is usually due to running on older version of pip. Try upgrading pip as follows.
+
+```bash
+pip3 install --upgrade pip
+```
+
+Then re-run the install command in [Step 2](redshift.md#install-from-pypi-or-source).
+
+### requests.exceptions.ConnectionError
+
+If you encounter the following error when attempting to run the ingestion workflow in Step 12, this is probably because there is no OpenMetadata server running at http://localhost:8585.
+
+```
+requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
+Max retries exceeded with url: /api/v1/services/databaseServices/name/aws_redshift
+(Caused by NewConnectionError(':
+Failed to establish a new connection: [Errno 61] Connection refused'))
+```
+
+To correct this problem, please follow the steps in the [Run OpenMetadata](../../install/run-openmetadata.md) guide to deploy OpenMetadata in Docker on your local machine.
+
+Then re-run the metadata ingestion workflow in [Step 12](redshift.md#run-manually).