Ayush Shah 5d6f385a75
Added Filter Params for Table and Schema (#1954)
* Added Filter Params for table and Schema

* Bigquery Doc changes

* Doc Changes for databases

* Filter Pattern Changes

* Table Filter Pattern Example Changes

* Filter Pattern Example Changes
2021-12-29 09:13:09 -08:00

2.8 KiB

description
This guide will help install Redshift Usage connector and run manually

Redshift Usage

{% hint style="info" %} Prerequisites

OpenMetadata is built using Java, DropWizard, Jetty, and MySQL.

  1. Python 3.7 or above {% endhint %}

Install from PyPI

{% tabs %} {% tab title="Install Using PyPI" %}

pip install 'openmetadata-ingestion[redshift-usage]'

{% endtab %} {% endtabs %}

Run Manually

metadata ingest -c ./examples/workflows/redshift_usage.json

Configuration

{% code title="redshift_usage.json" %}

{
  "source": {
    "type": "redshift-usage",
    "config": {
      "host_port": "cluster.name.region.redshift.amazonaws.com:5439",
      "username": "username",
      "password": "strong_password",
      "database": "warehouse",
      "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
      "service_name": "aws_redshift",
      "duration": 2
    }
  },
 ...

{% endcode %}

  1. username - pass the Redshift username. We recommend creating a user with read-only permissions to all the databases in your Redshift installation
  2. password - password for the username
  3. service_name - Service Name for this Redshift cluster. If you added the Redshift cluster through OpenMetadata UI, make sure the service name matches the same.
  4. table_filter_pattern - It contains includes, excludes options to choose which pattern of tables you want to ingest into OpenMetadata.
  5. schema_filter_pattern - It contains includes, excludes options to choose which pattern of schemas you want to ingest into OpenMetadata.

Publish to OpenMetadata

Below is the configuration to publish Redshift Usage data into the OpenMetadata service.

Add optionallyquery-parser processor, table-usage stage and metadata-usage bulk_sink along with metadata-server config

{% code title="redshift_usage.json" %}

{
  "source": {
    "type": "redshift-usage",
    "config": {
      "host_port": "cluster.name.region.redshift.amazonaws.com:5439",
      "username": "username",
      "password": "strong_password",
      "database": "warehouse",
      "where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
      "service_name": "aws_redshift",
      "duration": 2
    }
  },
  "processor": {
    "type": "query-parser",
    "config": {
      "filter": ""
    }
  },
  "stage": {
    "type": "table-usage",
    "config": {
      "filename": "/tmp/redshift_usage"
    }
  },
  "bulk_sink": {
    "type": "metadata-usage",
    "config": {
      "filename": "/tmp/redshift_usage"
    }
  },
  "metadata_server": {
    "type": "metadata-server",
    "config": {
      "api_endpoint": "http://localhost:8585/api",
      "auth_provider_type": "no-auth"
    }
  }
}

{% endcode %}