GitBook: [#80] Format Connectors

This commit is contained in:
pmbrull 2022-03-10 08:59:13 +00:00 committed by Sriharsha Chintalapani
parent 42dcb383c0
commit 3fcfb32274
10 changed files with 1716 additions and 7 deletions

View File

@ -40,6 +40,7 @@ This section will show you how to configure and run Data Profiling and Quality p
### Workflows
The **Ingestion Framework** currently supports two types of workflows:
* **Profiling:** Extracts metrics from SQL sources and configures and runs Data Quality tests. It requires previous executions of the Ingestion Pipeline.
* **Ingestion:** Captures metadata from the sources and updates the Entities' instances. This is a lightweight process that can be scheduled to have fast feedback on metadata changes in our sources.
* **Profiling:** Extracts metrics from SQL sources and sets up and runs Data Quality tests. It requires previous executions of the Ingestion Pipeline. This is a more time-consuming workflow that will run metrics and compare their result to the configured tests of both Tables and Columns.

View File

@ -83,7 +83,7 @@ curl --location --request POST '<domain-url>/oauth2/v1/clients' \
* To add scopes, navigate to your **Okta Dashboard**. Click on **Applications -> Applications** as in step 2.
* Click on your service app.
![](<../../../../docs/.gitbook/assets/image (35) (1).png>)
![](<../../../../docs/.gitbook/assets/image (35).png>)
* Now click on **Okta API Scopes** from the top nav bar.
* Grant the scopes by clicking on **Grant**. Ensure that the following scopes are granted:

View File

@ -0,0 +1,476 @@
---
description: >-
This guide will help you install and configure the BigQuery Usage connector
and run metadata ingestion workflows manually.
---
# BigQuery Usage
## **Requirements**
Using the OpenMetadata BigQuery Usage connector requires supporting services and software. Please ensure that your host system meets the requirements listed below. Then continue to follow the procedure for installing and configuring this connector.
### **OpenMetadata (version 0.8.0 or later)**
You must have a running deployment of OpenMetadata to use this guide. OpenMetadata includes the following services:
* OpenMetadata server supporting the metadata APIs and user interface
* Elasticsearch for metadata search and discovery
* MySQL as the backing store for all metadata
* Airflow for metadata ingestion workflows
If you have not already deployed OpenMetadata, please follow the instructions to [Run OpenMetadata](https://docs.open-metadata.org/install/run-openmetadata) to get up and running.
### **Python (version 3.8.0 or later)**
Please use the following command to check the version of Python you have.
```
python3 --version
```
## **Procedure**
Heres an overview of the steps in this procedure. Please follow the steps relevant to your use case.
1. [Prepare a Python virtual environment](bigquery-usage.md#1.-prepare-a-python-virtual-environment)
2. [Install the Python module for this connector](bigquery-usage.md#2.-install-the-python-module-for-this-connector)
3. [Create a configuration file using template JSON](bigquery-usage.md#3.-create-a-configuration-file-using-template-json)
4. [Configure service settings](bigquery-usage.md#4.-configure-service-settings)
5. [Enable/disable the data profiler](bigquery-usage.md#5.-enable-disable-the-data-profiler)
6. [Install the data profiler Python module (optional)](bigquery-usage.md#6.-install-the-data-profiler-python-module-optional)
7. [Configure data filters (optional)](bigquery-usage.md#7.-configure-data-filters-optional)
8. [Configure sample data (optional)](bigquery-usage.md#8.-configure-sample-data-optional)
9. [Configure DBT (optional)](bigquery-usage.md#9.-configure-dbt-optional)
10. [Confirm sink settings](bigquery-usage.md#10.-confirm-sink-settings)
11. [Confirm metadata\_server settings](bigquery-usage.md#11.-confirm-metadata\_server-settings)
12. [Run ingestion workflow](bigquery-usage.md#12.-run-ingestion-workflow)
### **1. Prepare a Python virtual environment**
In this step, well create a Python virtual environment. Using a virtual environment enables us to avoid conflicts with other Python installations and packages on your host system.
In a later step, you will install the Python module for this connector and its dependencies in this virtual environment.
#### **1.1 Create a directory for openmetadata**
Throughout the docs, we use a consistent directory structure for OpenMetadata services and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
```
mkdir openmetadata; cd openmetadata
```
#### **1.2 Create a virtual environment**
Run the following command to create a Python virtual environment called, `env`. You can try multiple connectors in the same virtual environment.
```
python3 -m venv env
```
#### **1.3 Activate the virtual environment**
Run the following command to activate the virtual environment.
```
source env/bin/activate
```
Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named `env`.
#### **1.4 Upgrade pip and setuptools to the latest versions**
Ensure that you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
```javascript
pip3 install --upgrade pip setuptools
```
### **2. Install the Python module for this connector**
Once the virtual environment is set up and activated as described in Step 1, run the following command to install the Python module for the BigQuery Usage connector.
```javascript
pip3 install 'openmetadata-ingestion[bigquery-usage]'
```
### **3. Create a configuration file using template JSON**
Create a new file called `bigquery_usage.json` in the current directory. Note that the current directory should be the `openmetadata` directory you created in Step 1.
Copy and paste the configuration template below into the `bigquery_usage.json` file you created.
{% hint style="info" %}
Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. In the steps below we describe how to customize the key-value pairs in the `source.config` field to meet your needs.
{% endhint %}
When adding the details for the credentials path, you can either choose to pass the `credentials file`, or add the `credentials_path`, or use a secure way to pass the credentials path using the environment variables, i.e., `Application Default Credentials` (ADC).
#### 3.1 Using Credentials File or Credentials Path
{% code title="bigquery-creds.json (boilerplate)" %}
```javascript
{
"type": "service_account",
"project_id": "project_id",
"private_key_id": "private_key_id",
"private_key": "",
"client_email": "gcpuser@project_id.iam.gserviceaccount.com",
"client_id": "",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": ""
}
```
{% endcode %}
You can optionally add the `query-parser` processor, `table-usage` stage and `metadata-usage` `bulk_sink` along with `metadata-server` config.
{% code title="bigquery_usage.json" %}
```javascript
{
"source": {
"type": "bigquery-usage",
"config": {
"project_id": "project_id",
"host_port": "https://bigquery.googleapis.com",
"username": "gcpuser@project_id.iam.gserviceaccount.com",
"service_name": "gcp_bigquery",
"duration": 2,
"options": {
"credentials_path": "examples/creds/bigquery-cred.json"
}
}
},
"processor": {
"type": "query-parser",
"config": {
"filter": ""
}
},
"stage": {
"type": "table-usage",
"config": {
"filename": "/tmp/bigquery_usage"
}
},
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/bigquery_usage"
}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
}
```
{% endcode %}
#### 3.2 Using Application Default Credentials (ADC)
{% code title="env variables" %}
```
export GOOGLE_APPLICATION_CREDENTIALS=<path-to-your-credentials-file>
```
{% endcode %}
Users can export the path to the credentials file. Using this option, you can export the env in terminal and run BigQuery Usage config without providing `credentials_path`.
### **4. Configure service settings**
In this step we will configure the BigQuery Usage service settings required for this connector. Please follow the instructions below to ensure that youve configured the connector to read from your BigQuery service as desired.
#### **host\_port**
Edit the value for `source.config.host_port` in `bigquery_usage.json` for your BigQuery Usage deployment. Use the `host:port` format illustrated in the example below.
```javascript
"host_port": "https://bigquery.googleapis.com"
```
Please ensure that your BigQuery Usage deployment is reachable from the host you are using to run metadata ingestion.
#### **username**
Edit the value for `source.config.username` to identify your BigQuery Usage user.
```javascript
"username": "username"
```
{% hint style="danger" %}
**Note:** The user specified should be authorized to read all databases you want to include in the metadata ingestion workflow.
{% endhint %}
#### **password**
Edit the value for `source.config.password` with the password for your BigQuery Usage user.
```javascript
"password": "strong_password"
```
#### **service\_name**
OpenMetadata uniquely identifies services by their `service_name`. Edit the value for `source.config.service_name` with a name that distinguishes this deployment from other services, including other BigQuery Usage services that you might be ingesting metadata from.
```javascript
"service_name": "gcp_bigquery"
```
#### duration
Use duration to specify the window of time in which the profiler should capture usage data. Values should be integers and represent the number of days for which to capture usage information. For example, if you specify 2 as the value for duration, the data profiler will capture usage information for the 48 hours prior to when the ingestion workflow is run.
```javascript
"duration": 2
```
#### **database (optional)**
If you want to limit metadata ingestion to a single database, include the `source.config.database` field in your configuration file. If this field is not included, the connector will ingest metadata from all databases that the specified user is authorized to read.
To specify a single database to ingest metadata from, provide the name of the database as the value for the `source.config.database` key as illustrated in the example below.
```javascript
"database": "bigquery_db"
```
### **5. Enable/disable the data profiler**
The data profiler ingests usage information for tables. This enables you to assess the frequency of use, reliability, and other details.
#### **data\_profiler\_enabled**
When enabled, the data profiler will run as part of metadata ingestion. Running the data profiler increases the amount of time it takes for metadata ingestion, but provides the benefits mentioned above.
You may disable the data profiler by setting the value for the key `source.config.data_profiler_enabled` to `"false"` as follows. Weve done this in the configuration template provided.
```javascript
"data_profiler_enabled": "false"
```
If you want to enable the data profiler, update your configuration file as follows.
```javascript
"data_profiler_enabled": "true"
```
{% hint style="info" %}
**Note:** The data profiler is enabled by default if no setting is provided for `data_profiler_enabled`
{% endhint %}
### **6. Install the data profiler Python module (optional)**
If youve enabled the data profiler in Step 5, run the following command to install the Python module for the data profiler. Youll need this to run the ingestion workflow.
```javascript
pip3 install 'openmetadata-ingestion[data-profiler]'
```
The data profiler module takes a few minutes to install. While it installs, continue through the remaining steps in this guide.
### **7. Configure data filters (optional)**
#### **include\_views (optional)**
Use `source.config.include_views` to control whether or not to include views as part of metadata ingestion and data profiling.
Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_views": "true"
```
Exclude views as follows.
```javascript
"include_views": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_views` is set to true by default.
{% endhint %}
#### **include\_tables (optional)**
Use `source.config.include_tables` to control whether or not to include tables as part of metadata ingestion and data profiling.
Explicitly include tables by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_tables": "true"
```
Exclude tables as follows.
```javascript
"include_tables": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_tables` is set to true by default.
{% endhint %}
#### **table\_filter\_pattern (optional)**
Use `source.config.table_filter_pattern` to select tables for metadata ingestion by name.
Use `source.config.table_filter_pattern.excludes` to exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See below for an example. This example is also included in the configuration template provided.
```javascript
"table_filter_pattern": {
"excludes": ["information_schema.*", "[\\w]*event_vw.*"]
}
```
Use `source.config.table_filter_pattern.includes` to include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See below for an example.
```javascript
"table_filter_pattern": {
"includes": ["corp.*", "dept.*"]
}
```
See the documentation for the[ Python re module](https://docs.python.org/3/library/re.html) for information on how to construct regular expressions.
{% hint style="info" %}
You may use either `excludes` or `includes` but not both in `table_filter_pattern`.
{% endhint %}
#### **schema\_filter\_pattern (optional)**
Use `source.config.schema_filter_pattern.excludes` and `source.config.schema_filter_pattern.includes` field to select the schemas for metadata ingestion by name. The configuration template provides an example.
The syntax and semantics for `schema_filter_pattern` are the same as for [`table_filter_pattern`](bigquery-usage.md#table\_filter\_pattern-optional). Please check that section for details.
### **8. Configure sample data (optional)**
#### **generate\_sample\_data (optional)**
Use the `source.config.generate_sample_data` field to control whether or not to generate sample data to include in table views in the OpenMetadata user interface. The image below provides an example.
![](../../.gitbook/assets/generate\_sample\_data.png)
Explicitly include sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "true"
```
If set to true, the connector will collect the first 50 rows of data from each table included in ingestion, and catalog that data as sample data, which users can refer to in the OpenMetadata user interface.
You can exclude the collection of sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "false"
```
{% hint style="info" %}
**Note:** `generate_sample_data` is set to true by default.
{% endhint %}
### **9. Configure DBT (optional)**
DBT provides transformation logic that creates tables and views from raw data. OpenMetadatas integration for DBT enables you to view the models used to generate a table from that table's details page in the OpenMetadata UI. The image below provides an example.
![](../../.gitbook/assets/configure\_dbt.png)
To include DBT models and metadata in your ingestion workflows, specify the location of the DBT manifest and catalog files as fields in your configuration file.
#### **dbt\_manifest\_file (optional)**
Use the field `source.config.dbt_manifest_file` to specify the location of your DBT manifest file. See below for an example.
```javascript
"dbt_manifest_file": "./dbt/manifest.json"
```
#### **dbt\_catalog\_file (optional)**
Use the field `source.config.dbt_catalog_file` to specify the location of your DBT catalog file. See below for an example.
```javascript
"dbt_catalog_file": "./dbt/catalog.json"
```
### **10. Confirm sink settings**
You need not make any changes to the fields defined for `sink` in the template code you copied into `bigquery_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/bigquery_usage"
}
},
```
### **11. Confirm metadata\_server settings**
You need not make any changes to the fields defined for `metadata_server` in the template code you copied into `bigquery_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
```
### **12. Run ingestion workflow**
Your `bigquery_usage.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
To run an ingestion workflow, execute the following command from the `openmetadata` directory you created in Step 1.
```
metadata ingest -c ./bigquery_usage.json
```
## **Next Steps**
As the ingestion workflow runs, you may observe progress both from the command line and from the OpenMetadata user interface. To view the metadata ingested from BigQuery Usage, visit [http://localhost:8585/explore/tables](http://localhost:8585/explore/tables). Select the BigQuery Usage service to filter for the data youve ingested using the workflow you configured and ran following this guide. The image below provides an example.
![](<../../.gitbook/assets/next\_steps (1).png>)
## **Troubleshooting**
### **ERROR: Failed building wheel for cryptography**
When attempting to install the `openmetadata-ingestion[bigquery-usage]` Python package in Step 2, you might encounter the following error. The error might include a mention of a Rust compiler.
```
Failed to build cryptography
ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
```
```
pip3 install --upgrade pip setuptools
```
Then re-run the install command in [Step 2](bigquery-usage.md#2.-install-the-python-module-for-this-connector).
### **requests.exceptions.ConnectionError**
If you encounter the following error when attempting to run the ingestion workflow in Step 12, this is probably because there is no OpenMetadata server running at http://localhost:8585.
```
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
Max retries exceeded with url: /api/v1/services/databaseServices/name/gcp_bigquery
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1031fa310>:
Failed to establish a new connection: [Errno 61] Connection refused'))
```
To correct this problem, please follow the steps in the [Run OpenMetadata ](https://docs.open-metadata.org/install/run-openmetadata)guide to deploy OpenMetadata in Docker on your local machine.
Then re-run the metadata ingestion workflow in [Step 12](bigquery-usage.md#12.-run-ingestion-workflow).

View File

@ -0,0 +1,444 @@
---
description: >-
This guide will help you install and configure the Redshift Usage connector
and run metadata ingestion workflows manually.
---
# Redshift Usage
## **Requirements**
Using the OpenMetadata Redshift Usage connector requires supporting services and software. Please ensure that your host system meets the requirements listed below. Then continue to follow the procedure for installing and configuring this connector.
### **OpenMetadata (version 0.8.0 or later)**
You must have a running deployment of OpenMetadata to use this guide. OpenMetadata includes the following services:
* OpenMetadata server supporting the metadata APIs and user interface
* Elasticsearch for metadata search and discovery
* MySQL as the backing store for all metadata
* Airflow for metadata ingestion workflows
If you have not already deployed OpenMetadata, please follow the instructions to [Run OpenMetadata](https://docs.open-metadata.org/install/run-openmetadata) to get up and running.
### **Python (version 3.8.0 or later)**
Please use the following command to check the version of Python you have.
```
python3 --version
```
## **Procedure**
Heres an overview of the steps in this procedure. Please follow the steps relevant to your use case.
1. [Prepare a Python virtual environment](redshift-usage.md#1.-prepare-a-python-virtual-environment)
2. [Install the Python module for this connector](redshift-usage.md#2.-install-the-python-module-for-this-connector)
3. [Create a configuration file using template JSON](redshift-usage.md#3.-create-a-configuration-file-using-template-json)
4. [Configure service settings](redshift-usage.md#4.-configure-service-settings)
5. [Enable/disable the data profiler](redshift-usage.md#5.-enable-disable-the-data-profiler)
6. [Install the data profiler Python module (optional)](redshift-usage.md#6.-install-the-data-profiler-python-module-optional)
7. [Configure data filters (optional)](redshift-usage.md#7.-configure-data-filters-optional)
8. [Configure sample data (optional)](redshift-usage.md#8.-configure-sample-data-optional)
9. [Configure DBT (optional)](redshift-usage.md#9.-configure-dbt-optional)
10. [Confirm sink settings](redshift-usage.md#10.-confirm-sink-settings)
11. [Confirm metadata\_server settings](redshift-usage.md#11.-confirm-metadata\_server-settings)
12. [Run ingestion workflow](redshift-usage.md#12.-run-ingestion-workflow)
### **1. Prepare a Python virtual environment**
In this step, well create a Python virtual environment. Using a virtual environment enables us to avoid conflicts with other Python installations and packages on your host system.
In a later step, you will install the Python module for this connector and its dependencies in this virtual environment.
#### **1.1 Create a directory for openmetadata**
Throughout the docs, we use a consistent directory structure for OpenMetadata services and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
```
mkdir openmetadata; cd openmetadata
```
#### **1.2 Create a virtual environment**
Run the following command to create a Python virtual environment called, `env`. You can try multiple connectors in the same virtual environment.
```
python3 -m venv env
```
#### **1.3 Activate the virtual environment**
Run the following command to activate the virtual environment.
```
source env/bin/activate
```
Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named `env`.
#### **1.4 Upgrade pip and setuptools to the latest versions**
Ensure that you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
```javascript
pip3 install --upgrade pip setuptools
```
### **2. Install the Python module for this connector**
Once the virtual environment is set up and activated as described in Step 1, run the following command to install the Python module for the Redshift Usage connector.
```javascript
pip3 install 'openmetadata-ingestion[redshift-usage]'
```
### **3. Create a configuration file using template JSON**
Create a new file called `redshift_usage.json` in the current directory. Note that the current directory should be the `openmetadata` directory you created in Step 1.
Copy and paste the configuration template below into the `redshift_usage.json` file you created.
{% hint style="info" %}
Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. In the steps below we describe how to customize the key-value pairs in the `source.config` field to meet your needs.
{% endhint %}
You can optionally add the `query-parser` processor, `table-usage` stage and `metadata-usage` `bulk_sink` along with `metadata-server` config.
{% code title="redshift_usage.json" %}
```javascript
{
"source": {
"type": "redshift-usage",
"config": {
"host_port": "cluster.name.region.redshift.amazonaws.com:5439",
"username": "username",
"password": "strong_password",
"database": "warehouse",
"where_clause": "and q.label != 'metrics' and q.label != 'health' and q.label != 'cmstats'",
"service_name": "aws_redshift",
"duration": 2
}
},
"processor": {
"type": "query-parser",
"config": {
"filter": ""
}
},
"stage": {
"type": "table-usage",
"config": {
"filename": "/tmp/redshift_usage"
}
},
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/redshift_usage"
}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
}
```
{% endcode %}
### **4. Configure service settings**
In this step we will configure the Redshift Usage service settings required for this connector. Please follow the instructions below to ensure that youve configured the connector to read from your Redshift service as desired.
#### **host\_port**
Edit the value for `source.config.host_port` in `redshift_usage.json` for your Redshift Usage deployment. Use the `host:port` format illustrated in the example below.
```javascript
"host_port": "cluster.name.region.redshift.amazonaws.com:5439"
```
Please ensure that your Redshift Usage deployment is reachable from the host you are using to run metadata ingestion.
#### **username**
Edit the value for `source.config.username` to identify your Redshift Usage user.
```javascript
"username": "username"
```
{% hint style="danger" %}
**Note:** The user specified should be authorized to read all databases you want to include in the metadata ingestion workflow.
{% endhint %}
#### **password**
Edit the value for `source.config.password` with the password for your Redshift Usage user.
```javascript
"password": "strong_password"
```
#### **service\_name**
OpenMetadata uniquely identifies services by their `service_name`. Edit the value for `source.config.service_name` with a name that distinguishes this deployment from other services, including other Redshift Usage services that you might be ingesting metadata from.
```javascript
"service_name": "aws_redshift"
```
#### duration
Use duration to specify the window of time in which the profiler should capture usage data. Values should be integers and represent the number of days for which to capture usage information. For example, if you specify 2 as the value for duration, the data profiler will capture usage information for the 48 hours prior to when the ingestion workflow is run.
```javascript
"duration": 2
```
#### **database (optional)**
If you want to limit metadata ingestion to a single database, include the `source.config.database` field in your configuration file. If this field is not included, the connector will ingest metadata from all databases that the specified user is authorized to read.
To specify a single database to ingest metadata from, provide the name of the database as the value for the `source.config.database` key as illustrated in the example below.
```javascript
"database": "warehouse"
```
### **5. Enable/disable the data profiler**
The data profiler ingests usage information for tables. This enables you to assess the frequency of use, reliability, and other details.
#### **data\_profiler\_enabled**
When enabled, the data profiler will run as part of metadata ingestion. Running the data profiler increases the amount of time it takes for metadata ingestion, but provides the benefits mentioned above.
You may disable the data profiler by setting the value for the key `source.config.data_profiler_enabled` to `"false"` as follows. Weve done this in the configuration template provided.
```javascript
"data_profiler_enabled": "false"
```
If you want to enable the data profiler, update your configuration file as follows.
```javascript
"data_profiler_enabled": "true"
```
{% hint style="info" %}
**Note:** The data profiler is enabled by default if no setting is provided for `data_profiler_enabled`
{% endhint %}
### **6. Install the data profiler Python module (optional)**
If youve enabled the data profiler in Step 5, run the following command to install the Python module for the data profiler. Youll need this to run the ingestion workflow.
```javascript
pip3 install 'openmetadata-ingestion[data-profiler]'
```
The data profiler module takes a few minutes to install. While it installs, continue through the remaining steps in this guide.
### **7. Configure data filters (optional)**
#### **include\_views (optional)**
Use `source.config.include_views` to control whether or not to include views as part of metadata ingestion and data profiling.
Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_views": "true"
```
Exclude views as follows.
```javascript
"include_views": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_views` is set to true by default.
{% endhint %}
#### **include\_tables (optional)**
Use `source.config.include_tables` to control whether or not to include tables as part of metadata ingestion and data profiling.
Explicitly include tables by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_tables": "true"
```
Exclude tables as follows.
```javascript
"include_tables": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_tables` is set to true by default.
{% endhint %}
#### **table\_filter\_pattern (optional)**
Use `source.config.table_filter_pattern` to select tables for metadata ingestion by name.
Use `source.config.table_filter_pattern.excludes` to exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See below for an example. This example is also included in the configuration template provided.
```javascript
"table_filter_pattern": {
"excludes": ["information_schema.*", "[\\w]*event_vw.*"]
}
```
Use `source.config.table_filter_pattern.includes` to include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See below for an example.
```javascript
"table_filter_pattern": {
"includes": ["corp.*", "dept.*"]
}
```
See the documentation for the[ Python re module](https://docs.python.org/3/library/re.html) for information on how to construct regular expressions.
{% hint style="info" %}
You may use either `excludes` or `includes` but not both in `table_filter_pattern`.
{% endhint %}
#### **schema\_filter\_pattern (optional)**
Use `source.config.schema_filter_pattern.excludes` and `source.config.schema_filter_pattern.includes` field to select the schemas for metadata ingestion by name. The configuration template provides an example.
The syntax and semantics for `schema_filter_pattern` are the same as for [`table_filter_pattern`](redshift-usage.md#table\_filter\_pattern-optional). Please check that section for details.
### **8. Configure sample data (optional)**
#### **generate\_sample\_data (optional)**
Use the `source.config.generate_sample_data` field to control whether or not to generate sample data to include in table views in the OpenMetadata user interface. The image below provides an example.
![](../../.gitbook/assets/generate\_sample\_data.png)
Explicitly include sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "true"
```
If set to true, the connector will collect the first 50 rows of data from each table included in ingestion, and catalog that data as sample data, which users can refer to in the OpenMetadata user interface.
You can exclude the collection of sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "false"
```
{% hint style="info" %}
**Note:** `generate_sample_data` is set to true by default.
{% endhint %}
### **9. Configure DBT (optional)**
DBT provides transformation logic that creates tables and views from raw data. OpenMetadatas integration for DBT enables you to view the models used to generate a table from that table's details page in the OpenMetadata UI. The image below provides an example.
![](../../.gitbook/assets/configure\_dbt.png)
To include DBT models and metadata in your ingestion workflows, specify the location of the DBT manifest and catalog files as fields in your configuration file.
#### **dbt\_manifest\_file (optional)**
Use the field `source.config.dbt_manifest_file` to specify the location of your DBT manifest file. See below for an example.
```javascript
"dbt_manifest_file": "./dbt/manifest.json"
```
#### **dbt\_catalog\_file (optional)**
Use the field `source.config.dbt_catalog_file` to specify the location of your DBT catalog file. See below for an example.
```javascript
"dbt_catalog_file": "./dbt/catalog.json"
```
### **10. Confirm sink settings**
You need not make any changes to the fields defined for `sink` in the template code you copied into `redshift_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/redshift_usage"
}
},
```
### **11. Confirm metadata\_server settings**
You need not make any changes to the fields defined for `metadata_server` in the template code you copied into `redshift_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
```
### **12. Run ingestion workflow**
Your `redshift_usage.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
To run an ingestion workflow, execute the following command from the `openmetadata` directory you created in Step 1.
```
metadata ingest -c ./redshift_usage.json
```
## **Next Steps**
As the ingestion workflow runs, you may observe progress both from the command line and from the OpenMetadata user interface. To view the metadata ingested from Redshift Usage, visit [http://localhost:8585/explore/tables](http://localhost:8585/explore/tables). Select the Redshift Usage service to filter for the data youve ingested using the workflow you configured and ran following this guide. The image below provides an example.
![](<../../.gitbook/assets/next\_steps (1).png>)
## **Troubleshooting**
### **ERROR: Failed building wheel for cryptography**
When attempting to install the `openmetadata-ingestion[redshift-usage]` Python package in Step 2, you might encounter the following error. The error might include a mention of a Rust compiler.
```
Failed to build cryptography
ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
```
```
pip3 install --upgrade pip setuptools
```
Then re-run the install command in [Step 2](redshift-usage.md#2.-install-the-python-module-for-this-connector).
### **requests.exceptions.ConnectionError**
If you encounter the following error when attempting to run the ingestion workflow in Step 12, this is probably because there is no OpenMetadata server running at http://localhost:8585.
```
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
Max retries exceeded with url: /api/v1/services/databaseServices/name/aws_redshift
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1031fa310>:
Failed to establish a new connection: [Errno 61] Connection refused'))
```
To correct this problem, please follow the steps in the [Run OpenMetadata ](https://docs.open-metadata.org/install/run-openmetadata)guide to deploy OpenMetadata in Docker on your local machine.
Then re-run the metadata ingestion workflow in [Step 12](redshift-usage.md#12.-run-ingestion-workflow).

View File

@ -0,0 +1,444 @@
---
description: >-
This guide will help you install and configure the Snowflake Usage connector
and run metadata ingestion workflows manually.
---
# Snowflake Usage
## **Requirements**
Using the OpenMetadata Snowflake Usage connector requires supporting services and software. Please ensure that your host system meets the requirements listed below. Then continue to follow the procedure for installing and configuring this connector.
### **OpenMetadata (version 0.8.0 or later)**
You must have a running deployment of OpenMetadata to use this guide. OpenMetadata includes the following services:
* OpenMetadata server supporting the metadata APIs and user interface
* Elasticsearch for metadata search and discovery
* MySQL as the backing store for all metadata
* Airflow for metadata ingestion workflows
If you have not already deployed OpenMetadata, please follow the instructions to [Run OpenMetadata](https://docs.open-metadata.org/install/run-openmetadata) to get up and running.
### **Python (version 3.8.0 or later)**
Please use the following command to check the version of Python you have.
```
python3 --version
```
## **Procedure**
Heres an overview of the steps in this procedure. Please follow the steps relevant to your use case.
1. [Prepare a Python virtual environment](snowflake-usage.md#1.-prepare-a-python-virtual-environment)
2. [Install the Python module for this connector](snowflake-usage.md#2.-install-the-python-module-for-this-connector)
3. [Create a configuration file using template JSON](snowflake-usage.md#3.-create-a-configuration-file-using-template-json)
4. [Configure service settings](snowflake-usage.md#4.-configure-service-settings)
5. [Enable/disable the data profiler](snowflake-usage.md#5.-enable-disable-the-data-profiler)
6. [Install the data profiler Python module (optional)](snowflake-usage.md#6.-install-the-data-profiler-python-module-optional)
7. [Configure data filters (optional)](snowflake-usage.md#7.-configure-data-filters-optional)
8. [Configure sample data (optional)](snowflake-usage.md#8.-configure-sample-data-optional)
9. [Configure DBT (optional)](snowflake-usage.md#9.-configure-dbt-optional)
10. [Confirm sink settings](snowflake-usage.md#10.-confirm-sink-settings)
11. [Confirm metadata\_server settings](snowflake-usage.md#11.-confirm-metadata\_server-settings)
12. [Run ingestion workflow](snowflake-usage.md#12.-run-ingestion-workflow)
### **1. Prepare a Python virtual environment**
In this step, well create a Python virtual environment. Using a virtual environment enables us to avoid conflicts with other Python installations and packages on your host system.
In a later step, you will install the Python module for this connector and its dependencies in this virtual environment.
#### **1.1 Create a directory for openmetadata**
Throughout the docs, we use a consistent directory structure for OpenMetadata services and connector installation. If you have not already done so by following another guide, please create an openmetadata directory now and change into that directory in your command line environment.
```
mkdir openmetadata; cd openmetadata
```
#### **1.2 Create a virtual environment**
Run the following command to create a Python virtual environment called, `env`. You can try multiple connectors in the same virtual environment.
```
python3 -m venv env
```
#### **1.3 Activate the virtual environment**
Run the following command to activate the virtual environment.
```
source env/bin/activate
```
Once activated, you should see your command prompt change to indicate that your commands will now be executed in the environment named `env`.
#### **1.4 Upgrade pip and setuptools to the latest versions**
Ensure that you have the latest version of pip by running the following command. If you have followed the steps above, this will upgrade pip in your virtual environment.
```javascript
pip3 install --upgrade pip setuptools
```
### **2. Install the Python module for this connector**
Once the virtual environment is set up and activated as described in Step 1, run the following command to install the Python module for the Snowflake Usage connector.
```javascript
pip3 install 'openmetadata-ingestion[snowflake-usage]'
```
### **3. Create a configuration file using template JSON**
Create a new file called `snowflake_usage.json` in the current directory. Note that the current directory should be the `openmetadata` directory you created in Step 1.
Copy and paste the configuration template below into the `snowflake_usage.json` file you created.
{% hint style="info" %}
Note: The `source.config` field in the configuration JSON will include the majority of the settings for your connector. In the steps below we describe how to customize the key-value pairs in the `source.config` field to meet your needs.
{% endhint %}
You can optionally add the `query-parser` processor, `table-usage` stage and `metadata-usage` `bulk_sink` along with `metadata-server` config
{% code title="snowflake_usage.json" %}
```javascript
{
"source": {
"type": "snowflake-usage",
"config": {
"host_port": "account.region.service.snowflakecomputing.com",
"username": "username",
"password": "strong_password",
"database": "SNOWFLAKE_SAMPLE_DATA",
"account": "account_name",
"service_name": "snowflake",
"duration": 2
}
},
"processor": {
"type": "query-parser",
"config": {
"filter": ""
}
},
"stage": {
"type": "table-usage",
"config": {
"filename": "/tmp/snowflake_usage"
}
},
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/snowflake_usage"
}
},
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
}
```
{% endcode %}
### **4. Configure service settings**
In this step we will configure the Snowflake Usage service settings required for this connector. Please follow the instructions below to ensure that youve configured the connector to read from your Snowflake service as desired.
#### **host\_port**
Edit the value for `source.config.host_port` in `snowflake_usage.json` for your Snowflake Usage deployment. Use the `host:port` format illustrated in the example below.
```javascript
"host_port": "account.region.service.snowflakecomputing.com"
```
Please ensure that your Snowflake Usage deployment is reachable from the host you are using to run metadata ingestion.
#### **username**
Edit the value for `source.config.username` to identify your Snowflake Usage user.
```javascript
"username": "username"
```
{% hint style="danger" %}
**Note:** The user specified should be authorized to read all databases you want to include in the metadata ingestion workflow.
{% endhint %}
#### **password**
Edit the value for `source.config.password` with the password for your Snowflake Usage user.
```javascript
"password": "strong_password"
```
#### **service\_name**
OpenMetadata uniquely identifies services by their `service_name`. Edit the value for `source.config.service_name` with a name that distinguishes this deployment from other services, including other Snowflake Usage services that you might be ingesting metadata from.
```javascript
"service_name": "snowflake"
```
#### duration
Use duration to specify the window of time in which the profiler should capture usage data. Values should be integers and represent the number of days for which to capture usage information. For example, if you specify 2 as the value for duration, the data profiler will capture usage information for the 48 hours prior to when the ingestion workflow is run.
```javascript
"duration": 2
```
#### **database (optional)**
If you want to limit metadata ingestion to a single database, include the `source.config.database` field in your configuration file. If this field is not included, the connector will ingest metadata from all databases that the specified user is authorized to read.
To specify a single database to ingest metadata from, provide the name of the database as the value for the `source.config.database` key as illustrated in the example below.
```javascript
"database": "SNOWFLAKE_SAMPLE_DATA"
```
### **5. Enable/disable the data profiler**
The data profiler ingests usage information for tables. This enables you to assess the frequency of use, reliability, and other details.
#### **data\_profiler\_enabled**
When enabled, the data profiler will run as part of metadata ingestion. Running the data profiler increases the amount of time it takes for metadata ingestion, but provides the benefits mentioned above.
You may disable the data profiler by setting the value for the key `source.config.data_profiler_enabled` to `"false"` as follows. Weve done this in the configuration template provided.
```javascript
"data_profiler_enabled": "false"
```
If you want to enable the data profiler, update your configuration file as follows.
```javascript
"data_profiler_enabled": "true"
```
{% hint style="info" %}
**Note:** The data profiler is enabled by default if no setting is provided for `data_profiler_enabled`
{% endhint %}
### **6. Install the data profiler Python module (optional)**
If youve enabled the data profiler in Step 5, run the following command to install the Python module for the data profiler. Youll need this to run the ingestion workflow.
```javascript
pip3 install 'openmetadata-ingestion[data-profiler]'
```
The data profiler module takes a few minutes to install. While it installs, continue through the remaining steps in this guide.
### **7. Configure data filters (optional)**
#### **include\_views (optional)**
Use `source.config.include_views` to control whether or not to include views as part of metadata ingestion and data profiling.
Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_views": "true"
```
Exclude views as follows.
```javascript
"include_views": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_views` is set to true by default.
{% endhint %}
#### **include\_tables (optional)**
Use `source.config.include_tables` to control whether or not to include tables as part of metadata ingestion and data profiling.
Explicitly include tables by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"include_tables": "true"
```
Exclude tables as follows.
```javascript
"include_tables": "false"
```
{% hint style="info" %}
**Note:** `source.config.include_tables` is set to true by default.
{% endhint %}
#### **table\_filter\_pattern (optional)**
Use `source.config.table_filter_pattern` to select tables for metadata ingestion by name.
Use `source.config.table_filter_pattern.excludes` to exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See below for an example. This example is also included in the configuration template provided.
```javascript
"table_filter_pattern": {
"excludes": ["information_schema.*", "[\\w]*event_vw.*"]
}
```
Use `source.config.table_filter_pattern.includes` to include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See below for an example.
```javascript
"table_filter_pattern": {
"includes": ["corp.*", "dept.*"]
}
```
See the documentation for the[ Python re module](https://docs.python.org/3/library/re.html) for information on how to construct regular expressions.
{% hint style="info" %}
You may use either `excludes` or `includes` but not both in `table_filter_pattern`.
{% endhint %}
#### **schema\_filter\_pattern (optional)**
Use `source.config.schema_filter_pattern.excludes` and `source.config.schema_filter_pattern.includes` field to select the schemas for metadata ingestion by name. The configuration template provides an example.
The syntax and semantics for `schema_filter_pattern` are the same as for [`table_filter_pattern`](snowflake-usage.md#table\_filter\_pattern-optional). Please check that section for details.
### **8. Configure sample data (optional)**
#### **generate\_sample\_data (optional)**
Use the `source.config.generate_sample_data` field to control whether or not to generate sample data to include in table views in the OpenMetadata user interface. The image below provides an example.
![](../../.gitbook/assets/generate\_sample\_data.png)
Explicitly include sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "true"
```
If set to true, the connector will collect the first 50 rows of data from each table included in ingestion, and catalog that data as sample data, which users can refer to in the OpenMetadata user interface.
You can exclude the collection of sample data by adding the following key-value pair in the `source.config` field of your configuration file.
```javascript
"generate_sample_data": "false"
```
{% hint style="info" %}
**Note:** `generate_sample_data` is set to true by default.
{% endhint %}
### **9. Configure DBT (optional)**
DBT provides transformation logic that creates tables and views from raw data. OpenMetadatas integration for DBT enables you to view the models used to generate a table from that table's details page in the OpenMetadata UI. The image below provides an example.
![](../../.gitbook/assets/configure\_dbt.png)
To include DBT models and metadata in your ingestion workflows, specify the location of the DBT manifest and catalog files as fields in your configuration file.
#### **dbt\_manifest\_file (optional)**
Use the field `source.config.dbt_manifest_file` to specify the location of your DBT manifest file. See below for an example.
```javascript
"dbt_manifest_file": "./dbt/manifest.json"
```
#### **dbt\_catalog\_file (optional)**
Use the field `source.config.dbt_catalog_file` to specify the location of your DBT catalog file. See below for an example.
```javascript
"dbt_catalog_file": "./dbt/catalog.json"
```
### **10. Confirm sink settings**
You need not make any changes to the fields defined for `sink` in the template code you copied into `snowflake_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"bulk_sink": {
"type": "metadata-usage",
"config": {
"filename": "/tmp/snowflake_usage"
}
},
```
### **11. Confirm metadata\_server settings**
You need not make any changes to the fields defined for `metadata_server` in the template code you copied into `snowflake_usage.json` in Step 4. This part of your configuration file should be as follows.
```javascript
"metadata_server": {
"type": "metadata-server",
"config": {
"api_endpoint": "http://localhost:8585/api",
"auth_provider_type": "no-auth"
}
}
```
### **12. Run ingestion workflow**
Your `snowflake_usage.json` configuration file should now be fully configured and ready to use in an ingestion workflow.
To run an ingestion workflow, execute the following command from the `openmetadata` directory you created in Step 1.
```
metadata ingest -c ./snowflake_usage.json
```
## **Next Steps**
As the ingestion workflow runs, you may observe progress both from the command line and from the OpenMetadata user interface. To view the metadata ingested from Snowflake Usage, visit [http://localhost:8585/explore/tables](http://localhost:8585/explore/tables). Select the Snowflake Usage service to filter for the data youve ingested using the workflow you configured and ran following this guide. The image below provides an example.
![](<../../.gitbook/assets/next\_steps (1).png>)
## **Troubleshooting**
### **ERROR: Failed building wheel for cryptography**
When attempting to install the `openmetadata-ingestion[snowflake-usage]` Python package in Step 2, you might encounter the following error. The error might include a mention of a Rust compiler.
```
Failed to build cryptography
ERROR: Could not build wheels for cryptography which use PEP 517 and cannot be installed directly
```
```
pip3 install --upgrade pip setuptools
```
Then re-run the install command in [Step 2](snowflake-usage.md#2.-install-the-python-module-for-this-connector).
### **requests.exceptions.ConnectionError**
If you encounter the following error when attempting to run the ingestion workflow in Step 12, this is probably because there is no OpenMetadata server running at http://localhost:8585.
```
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8585):
Max retries exceeded with url: /api/v1/services/databaseServices/name/snowflake
(Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1031fa310>:
Failed to establish a new connection: [Errno 61] Connection refused'))
```
To correct this problem, please follow the steps in the [Run OpenMetadata ](https://docs.open-metadata.org/install/run-openmetadata)guide to deploy OpenMetadata in Docker on your local machine.
Then re-run the metadata ingestion workflow in [Step 12](snowflake-usage.md#12.-run-ingestion-workflow).

View File

@ -12,7 +12,6 @@ This schema does not accept additional properties.
### Properties
## <<<<<<< HEAD
* **id** `required`
* Unique identifier that identifies this topic instance.
@ -187,17 +186,13 @@ This schema does not accept additional properties.
### Type definitions in this schema
<<<<<<< HEAD
#### topicName
\=======
* Name that identifies a topic.
* Type: `string`
* Length: between 1 and 128
> > > > > > > a07bc411 (updated json schema and schema docs (#3219))
* Name that identifies a topic.
* Type: `string`
@ -215,7 +210,6 @@ This schema does not accept additional properties.
#### cleanupPolicy
<<<<<<< HEAD
* Topic clean up policy. For Kafka - `cleanup.policy` configuration.
* The value is restricted to the following:

View File

@ -4,6 +4,7 @@ OpenMetadata Ingestion is a simple framework to build connectors and ingest meta
## Guides
* [Ingest Sample Data](ingest-sample-data.md)
* [Explore Connectors & Install](../docs/integrations/connectors/)
* [Ingest Sample Data](ingest-sample-data.md)
* [Configure Airflow](../docs/integrations/connectors/airflow/airflow.md)

View File

@ -0,0 +1,9 @@
# Try OpenMetadata
{% content-ref url="take-it-for-a-spin.md" %}
[take-it-for-a-spin.md](take-it-for-a-spin.md)
{% endcontent-ref %}
{% content-ref url="run-openmetadata.md" %}
[run-openmetadata.md](run-openmetadata.md)
{% endcontent-ref %}

View File

@ -0,0 +1,321 @@
---
description: >-
This installation doc will help you start a OpenMetadata standalone instance
on your local machine.
---
# Try OpenMetadata in Docker
## Requirements (OSX and Linux)
Please ensure your host system meets the requirements listed below. Then continue to the Procedure for installing OpenMetadata.
### Python (version 3.8.0 or greater)
To check what version of Python you have, please use the following command.
```
python3 --version
```
### Docker (version 20.10.0 or greater)
[Docker](https://docs.docker.com/get-started/overview/) is an open platform for developing, shipping, and running applications that enables you to separate your applications from your infrastructure so you can deliver software quickly using OS-level virtualization to deliver software in packages called containers.
To check what version of Docker you have, please use the following command.
```
docker --version
```
If you need to install Docker, please visit [Get Docker](https://docs.docker.com/get-docker/). You also need the latest `docker-compose` installed, please visit [Install Docker Compose](https://docs.docker.com/compose/install/).
{% hint style="warning" %}
Note: You must **allocate at least 4GB of memory to Docker** in order to run OpenMetadata. To change the memory allocation for Docker, please visit:
Preferences -> Resources -> Advanced
{% endhint %}
### docker-compose (version v1.29.2 or greater)
The docker-compose tool enables you to define and run multi-container Docker applications. The packages you will install in this guide use docker-compose to deploy OpenMetadata.
To install `docker-compose`, please follow the instructions at [Install Docker Compose](https://docs.docker.com/compose/install/).
## Procedure
### 1. Create a directory for OpenMetadata
Create a new directory for OpenMetadata and navigate into that directory.
```
mkdir openmetadata-docker && cd openmetadata-docker
```
### 2. Create a Python virtual environment
Create a virtual environment to avoid conflicts with other Python environments on your host system. A virtual environment is a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.
In a later step you will install the openmetadata-ingestion Python module and its dependencies in this virtual environment.
```
python3 -m venv env
```
### 3. Activate the virtual environment
```
source env/bin/activate
```
### 4. Upgrade pip and setuptools
```
pip3 install --upgrade pip setuptools
```
### 5. Install the OpenMetadata Python module using pip
```
pip3 install --upgrade 'openmetadata-ingestion[docker]'
```
### 6. Ensure the module is installed and ready for use
```
metadata docker --help
```
After running the command above, you should see output similar to the following.
```
Usage: metadata docker [OPTIONS]
Checks Docker Memory Allocation Run Latest Release Docker - metadata
docker --run Run Local Docker - metadata docker --run -t local -p
path/to/docker-compose.yml
Options:
--start Start release Docker containers
--stop Stop Docker containers (local and release)
--clean Prune unused containers, images, volumes and networks
-t, --type TEXT 'local' - local type will start local build of OpenMetadata
docker
-p, --path FILE Path to Local docker-compose.yml
--help Show this message and exit.
```
### 7. Start the OpenMetadata Docker containers
```
metadata docker --start
```
This will create a docker network and four containers for the following services:
* MySQL to store the metadata catalog
* Elasticsearch to maintain the metadata index which enables you to search the catalog
* Apache Airflow which OpenMetadata uses for metadata ingestion
* The OpenMetadata UI and API server
After starting the Docker containers, you should see output similar to the following.
```
[2021-11-18 15:53:52,532] INFO {metadata.cmd:202} - Running Latest Release Docker
[+] Running 5/5
⠿ Network tmp_app_net Created 0.3s
⠿ Container tmp_mysql_1 Started 1.0s
⠿ Container tmp_elasticsearch_1 Started 1.0s
⠿ Container tmp_ingestion_1 Started 2.1s
⠿ Container tmp_openmetadata-server_1 Started 2.2s
[2021-11-18 15:53:55,876] INFO {metadata.cmd:212} - Time took to get containers running: 0:00:03.124889
.......
```
After starting the containers, `metadata` will launch Airflow tasks to ingest sample metadata and usage data for you to experiment with. This might take several minutes, depending on your system.
{% hint style="info" %}
**Note:**
* `metadata docker --stop` will stop the Docker containers.
* `metadata docker --clean` will clean/prune the containers, volumes, and networks.
{% endhint %}
### 8. Wait for metadata ingestion to finish
Once metadata ingestion has finished and the OpenMetadata UI is ready for use, you will see output similar to the following.
```
[2021-11-18 15:54:51,165] INFO {metadata.cmd:232} - Time took to get OpenMetadata running: 0:00:58.414548
✔ OpenMetadata is up and running
Head to http://localhost:8585 to play around with OpenMetadata UI.
To checkout Ingestion via Airflow, go to http://localhost:8080
(username: admin, password: admin)
Need support? Get in touch on Slack: https://slack.open-metadata.org/
```
### 9. Log in to Airflow
Once metadata ingestion has finished and you see the message that OpenMetadata is up and running, visit the following url in your web browser.
```
http://localhost:8080
```
You will see a login prompt similar to the one in the figure below. Use the following credentials to log in to Airflow.
Username: `admin`
Password: `admin`
![](../docs/.gitbook/assets/airflow-login.png)
### 10. Begin using OpenMetadata
Finally, visit the following url to begin exploring OpenMetadata.
```
http://localhost:8585
```
You should see a page similar to the following as the landing page for the OpenMetadata server.
![](../docs/.gitbook/assets/om-local-landing-page.png)
### Next Steps
1. Visit the [Features](../docs/features.md) overview page and explore the OpenMetadata UI.
2. Visit the [Connectors](../docs/integrations/connectors/) documentation to see what services you can integrate with OpenMetadata.
3. Visit the [API](../docs/openmetadata-apis/apis/overview.md) documentation and explore the OpenMetadata APIs.
### Troubleshooting
#### Could not find a version that satisfied the requirement
```
pip3 install 'openmetadata-ingestion[docker]'
ERROR: Could not find a version that satisfies the requirement openmetadata-ingestion[docker] (from versions: none)
ERROR: No matching distribution found for openmetadata-ingestion[docker]
```
If you see the above when attempting to install OpenMetadata, this can be due to using older version of Python and pip. Please check the [Requirements](run-openmetadata.md#requirements) section above and confirm that you have supported versions installed.
If you need support please get in touch on Slack: [https://slack.open-metadata.org/](https://slack.open-metadata.org).
## Requirements (Windows)
### WSL2, Ubuntu 20.04, and Docker for Windows
1. Install [WSL2](https://ubuntu.com/wsl)
2. Install [Ubuntu 20.04](https://www.microsoft.com/en-us/p/ubuntu-2004-lts/9n6svws3rx71)
3. Install [Docker for Windows](https://www.docker.com/products/docker-desktop)
### In the Ubuntu terminal
```
cd ~
sudo apt update
sudo apt upgrade
sudo apt install python3-pip python3-venv
```
Follow the [OSX instructions](run-openmetadata.md#1.-create-a-directory-for-openmetadata)
## Upgrade OpenMetadata
If you would like to upgrade your OpenMetadata deployment installed following the procedure above, this procedure will guide you through the upgrade process.
### 1. Ensure your Python virtual environment is activated
The procedure for [installing OpenMetadata](run-openmetadata.md) asks you to create a new directory and Python virtual environment. The procedure then asks you to install the `openmetadata-ingestion[docker]` Python module in this virtual environment.
In your command-line environment, please navigate to the directory where you installed `openmetadata-ingestion[docker]` and activate the virtual environment by running the following command.
```
source env/bin/activate
```
### 2. Check the current version you have installed
To check the version of `openmetadata-ingestion[docker]` that you have installed, run the following command.
```bash
metadata --version
```
Upon running this command you should see output similar to the following.
```bash
metadata, version metadata 0.5.0 from /Users/om/openmetadata-docker/env/lib/python3.8 (python 3.8)
```
### 3. Check available versions
To confirm that there is a later version of `openmetadata-ingestion[docker]` available and identify the version you want to install, please run the following command.
```
pip3 install 'openmetadata-ingestion[docker]'==
```
Upon running this command, you should see output similar to the following.
```
ERROR: Could not find a version that satisfies the requirement
openmetadata-ingestion[docker]== (from versions: 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4,
0.3.0, 0.3.2, 0.4.0.dev0, 0.4.0.dev6, 0.4.0, 0.4.1.dev6, 0.4.1, 0.4.2.dev1, 0.4.2,
0.4.2.1, 0.4.3.dev1, 0.4.3.dev2, 0.4.3.dev3, 0.4.3.dev4, 0.4.3, 0.4.4, 0.4.5, 0.4.7,
0.4.8.dev0, 0.4.8.dev2, 0.4.8, 0.4.9, 0.4.10, 0.4.11, 0.5.0rc0, 0.5.0rc1, 0.5.0,
0.5.1.dev0, 0.6.0.dev0, 0.7.0.dev1, 0.7.0.dev2, 0.7.0.dev3, 0.7.0.dev4)
ERROR: No matching distribution found for openmetadata-ingestion[docker]==
```
The error messages are expected. This is the accepted means of checking available versions for a Python module using `pip`.
The output provides a complete list of available versions and enables you to determine whether there are release versions later than the version you currently have installed. Release versions have the form `x.x.x`. Examples of release versions in the above output include, `0.2.0`, `0.4.2`, and `0.5.0`.
From this output you can also find patch releases (e.g., `0.4.2.1`), release candidates (`0.5.0rc1`), and development releases (e.g., `0.7.0.dev4`).
### 4. Stop your currently running deployment
Before upgrading, if you are currently running an OpenMetadata deployment, please stop the deployment by running the following command.
```bash
metadata docker --stop
```
### 5. Install the version of your choice
#### Option 1. Install the latest release version
You may install the latest release version by running the following command.
```bash
pip3 install --upgrade 'openmetadata-ingestion[docker]'
```
#### Option 2. Install a specific release, patch, or development version
You may install a specific version of `openmetadata-ingestion[docker]`by running the following command, specifying the version you want to install in place of `<version>`.
```bash
pip3 install --upgrade 'openmetadata-ingestion[docker]'==<version>
```
For example, if you want to install the `0.7.0.dev4` release, you would run the following command.
```bash
pip3 install --upgrade 'openmetadata-ingestion[docker]'==0.7.0.dev4
```
### 6. Restart your deployment
Once you have successfully installed your preferred version of `openmetadata-ingestion[docker]`, restart your deployment using the new version, by running the following command.
```bash
metadata docker --start
```

View File

@ -0,0 +1,19 @@
# Try OpenMetadata in our Public Sandbox
We want our users to get the experience of OpenMetadata with the least effort 😁. That is why we have set up a [sandbox](https://sandbox.open-metadata.org) that mimics a real production setup. Please take it for a spin and let us know your feedback in the [general](https://openmetadata.slack.com/archives/C02AZGN0WKY) channel on [Slack](https://slack.open-metadata.org).
## To set up your sandbox account:
### 1. Login using your Google credentials
![](../.gitbook/assets/welcome.png)
### 2. Add yourself as a user. Pick a few teams to be part of because data is a team game.
![](../.gitbook/assets/create-user.png)
### 3. Try out few things
Don't limit yourself to just the callouts. Try other things too. We would love to get your feedback.
![](../.gitbook/assets/openmetadata-sandbox.png)