Docs - Python requirements & metadata docker (#6790)

Docs - Python requirements & metadata docker (#6790)
This commit is contained in:
Pere Miquel Brull 2022-08-18 11:43:45 +02:00 committed by GitHub
parent 89ec1f9c6d
commit 15e1bb531a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
78 changed files with 436 additions and 227 deletions

View File

@ -15,9 +15,26 @@ for data persistence. Learn how to do so [here](/deployment/docker/volumes).
To test out your security integration, check out how to
[Enable Security](/deployment/docker/security).
## Changing ports
This docker deployment is powered by `docker compose`, and uses the `docker-compose.yml` files shipped during
each release [example](https://github.com/open-metadata/OpenMetadata/releases/tag/0.11.4-release).
As with the [Named Volumes](/deployment/docker/volumes), you might want to tune a bit the compose file to modify
the default ports.
We are shipping the OpenMetadata server and UI at `8585`, and the ingestion container (Airflow) at `8080`. You can
take a look at the official Docker [docs](https://docs.docker.com/compose/compose-file/#ports). As an example, You could
update the ports to serve Airflow at `1234` with:
```yaml
ports:
- "1234:8080"
```
# Production Deployment
If instead, you are planning on going to PROD, we recommend the following
If you are planning on going to PROD, we also recommend taking a look at the following
deployment strategies:
<InlineCalloutContainer>

View File

@ -9,7 +9,6 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
## Global Chart Values
<Table>
| Key | Type | Default |
| :---------- | :---------- | :---------- |
@ -75,7 +74,6 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
| global.elasticsearch.trustStore.path | string | `Empty String` |
| global.elasticsearch.trustStore.password.secretRef | string | `elasticsearch-truststore-secrets` |
| global.elasticsearch.trustStore.password.secretKey | string | `openmetadata-elasticsearch-truststore-password` |
| global.fernetKey | string | `jJ/9sz0g0OHxsfxOoSfdFdmk3ysNmPRnH3TUAbz3IHA=` |
| global.jwtTokenConfiguration.enabled | bool | `false` |
| global.jwtTokenConfiguration.rsapublicKeyFilePath | string | `Empty String` |
| global.jwtTokenConfiguration.rsaprivateKeyFilePath | string | `Empty String` |
@ -86,11 +84,9 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
| global.openmetadata.host | string | `openmetadata` |
| global.openmetadata.port | int | 8585 |
</Table>
## Chart Values
<Table>
| Key | Type | Default |
| :---------- | :---------- | :---------- |
@ -129,5 +125,3 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
| serviceAccount.name | string | `nil` |
| sidecars | list | `[]` |
| tolerations | list | `[]` |
</Table>

View File

@ -50,6 +50,8 @@ site_menu:
url: /deployment/kubernetes/onprem
- category: Deployment / Kubernetes Deployment / Enable Security
url: /deployment/kubernetes/security
- category: Deployment / Kubernetes Deployment / Helm Values
url: /deployment/kubernetes/helm-values
- category: Deployment / Enable Security
url: /deployment/security

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/looker/airflow
<Requirements />
<PythonMod connector="Looker" module="looker" />
<MetadataIngestionServiceDev service="dashboard" connector="Looker" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/looker/cli
<Requirements />
<PythonMod connector="Looker" module="looker" />
<MetadataIngestionServiceDev service="dashboard" connector="Looker" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/metabase/airflow
<Requirements />
<PythonMod connector="Metabase" module="metabase" />
<MetadataIngestionServiceDev service="dashboard" connector="Metabase" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/metabase/cli
<Requirements />
<PythonMod connector="Metabase" module="metabase" />
<MetadataIngestionServiceDev service="dashboard" connector="Metabase" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/powerbi/airflow
<Requirements />
<PythonMod connector="PowerBI" module="powerbi" />
<MetadataIngestionServiceDev service="dashboard" connector="PowerBI" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/powerbi/cli
<Requirements />
<PythonMod connector="PowerBI" module="powerbi" />
<MetadataIngestionServiceDev service="dashboard" connector="PowerBI" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/redash/airflow
<Requirements />
<PythonMod connector="Redash" module="redash" />
<MetadataIngestionServiceDev service="dashboard" connector="Redash" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/redash/cli
<Requirements />
<PythonMod connector="Redash" module="redash" />
<MetadataIngestionServiceDev service="dashboard" connector="Redash" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/superset/airflow
<Requirements />
<PythonMod connector="Superset" module="superset" />
<MetadataIngestionServiceDev service="dashboard" connector="Superset" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/superset/cli
<Requirements />
<PythonMod connector="Superset" module="superset" />
<MetadataIngestionServiceDev service="dashboard" connector="Superset" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/tableau/airflow
<Requirements />
<PythonMod connector="Tableau" module="tableau" />
<MetadataIngestionServiceDev service="dashboard" connector="Tableau" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/tableau/cli
<Requirements />
<PythonMod connector="Tableau" module="tableau" />
<MetadataIngestionServiceDev service="dashboard" connector="Tableau" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/athena/airflow
<Requirements />
<PythonMod connector="Athena" module="athena" />
<MetadataIngestionServiceDev service="database" connector="Athena" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/athena/cli
<Requirements />
<PythonMod connector="Athena" module="athena" />
<MetadataIngestionServiceDev service="database" connector="BigQuery" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/azuresql/airflow
<Requirements />
<PythonMod connector="AzureSQL" module="azuresql" />
<MetadataIngestionServiceDev service="database" connector="AzureSQL" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/azuresql/cli
<Requirements />
<PythonMod connector="AzureSQL" module="azuresql" />
<MetadataIngestionServiceDev service="database" connector="AzureSQL" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/bigquery/airflow
<Requirements />
<PythonMod connector="BigQuery" module="bigquery" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[bigquery-usage]"
```
<h4>GCP Permissions</h4>
<p> To execute metadata extraction and usage workflow successfully the user or the service account should have enough access to fetch required data. Following table describes the minimum required permissions </p>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/bigquery/cli
<Requirements />
<PythonMod connector="BigQuery" module="bigquery" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[bigquery-usage]"
```
<h4>GCP Permissions</h4>
<p> To execute metadata extraction and usage workflow successfully the user or the service account should have enough access to fetch required data. Following table describes the minimum required permissions </p>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/clickhouse/airflow
<Requirements />
<PythonMod connector="ClickHouse" module="clickhouse" />
<MetadataIngestionServiceDev service="database" connector="Clickhouse" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/clickhouse/cli
<Requirements />
<PythonMod connector="ClickHouse" module="clickhouse" />
<MetadataIngestionServiceDev service="database" connector="Clickhouse" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/databricks/airflow
<Requirements />
<PythonMod connector="Databricks" module="databricks" />
<MetadataIngestionServiceDev service="database" connector="Databricks" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/databricks/cli
<Requirements />
<PythonMod connector="Databricks" module="databricks" />
<MetadataIngestionServiceDev service="database" connector="Databricks" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/datalake/airflow
<Requirements />
<PythonMod connector="DataLake" module="datalake" />
## Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.
@ -59,8 +61,6 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
This is a sample config for Datalake using GCS:
```yaml
@ -103,13 +103,13 @@ workflowConfig:
```
<h4>Source Configuration - Service Connection using GCS</h4>
#### Source Configuration - Service Connection using GCS
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
* **type**: Credentials type, e.g. `service_account`.
* **projectId**
* **privat**eKe**y**
* **privateKey**
* **privateKeyId**
* **clientEmail**
* **clientId**
@ -117,8 +117,8 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
* **tokenUri**: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default
* **authProviderX509CertUrl**: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
* **clientX509CertUrl**
* **bucketName :** name of the bucket in GCS
* **Prefix** : prefix in gcs bucket
* **bucketName**: name of the bucket in GCS
* **Prefix**: prefix in gcs bucket
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
<MetadataIngestionConfig service="database" connector="Datalake" goal="Airflow" />

View File

@ -7,6 +7,7 @@ slug: /openmetadata/connectors/database/datalake/cli
<Requirements />
<PythonMod connector="DataLake" module="datalake" />
## Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.
@ -45,11 +46,9 @@ workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
#### Source Configuration - Source Config using AWS S3
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
@ -59,9 +58,6 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
* **awsRegion**: Specify the region in which your DynamoDB is located. This setting is required even if you have configured a local AWS profile.
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
This is a sample config for Datalake using GCS:
```yaml
@ -104,13 +100,13 @@ workflowConfig:
```
<h4>Source Configuration - Service Connection using GCS</h4>
#### Source Configuration - Service Connection using GCS
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
* **type**: Credentials type, e.g. `service_account`.
* **projectId**
* **privat**eKe**y**
* **privateKey**
* **privateKeyId**
* **clientEmail**
* **clientId**
@ -118,8 +114,8 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
* **tokenUri**: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default
* **authProviderX509CertUrl**: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
* **clientX509CertUrl**
* **bucketName :** name of the bucket in GCS
* **Prefix** : prefix in gcs bucket
* **bucketName**: name of the bucket in GCS
* **Prefix**: prefix in gcs bucket
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
<MetadataIngestionConfig service="database" connector="Datalake" goal="CLI" />

View File

@ -3,59 +3,18 @@ title: Datalake
slug: /openmetadata/connectors/database/datalake
---
<ConnectorIntro connector="Datalake" goal="Airflow" />
<ConnectorIntro connector="Datalake" />
<Requirements />
<MetadataIngestionService connector="Datalake"/>
## Metadata Ingestion
<h4>Connection Options</h4>
### 1. Visit the Services Page
The first step is ingesting the metadata from your sources. Under Settings you will find a **Services** link an external source system to OpenMetadata. Once a service is created, it can be used to configure metadata, usage, and profiler workflows.
To visit the _Services_ page, select _Services_ from the _Settings_ menu.
![Navigate to Settings >> Services](<https://raw.githubusercontent.com/open-metadata/OpenMetadata/0.10.1-docs/.gitbook/assets/image%20(14).png>)
### 2. Create a New Service
Click on the _Add New Service_ button to start the Service creation.
<Image src="/images/openmetadata/connectors/datalake/create-new-service.png" alt="create-new-service"/>
### 3. Select the Service Type
Select Datalake as the service type and click _Next_.
<Image src="/images/openmetadata/connectors/datalake/select-service.png" alt="select-service"/>
### 4. Name and Describe your Service
Provide a name and description for your service as illustrated below.
<Image src="/images/openmetadata/connectors/datalake/describe-service.png" alt="describe-service"/>
#### Service Name
OpenMetadata uniquely identifies services by their _Service Name_. Provide a name that distinguishes your deployment from other services, including the other Datalake services that you might be ingesting metadata from.
### 5. Configure the Service Connection
In this step, we will configure the connection settings required for this connector. Please follow the instructions below to ensure that you've configured the connector to read from your Datalake service as desired.
**Datalake using AWS S3**
<Collapse title="Datalake using AWS S3">
<Image src="/images/openmetadata/connectors/datalake/service-connection-using-aws-s3.png" alt="create-account"/>
<details>
<summary>Connection Options for AWS S3</summary>
**AWS Access Key ID**
@ -91,25 +50,12 @@ Enter the details for any additional connection options that can be sent to Dyna
Enter the details for any additional connection arguments such as security or protocol configs that can be sent to DynamoDB during the connection. These details must be added as Key-Value pairs.
In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows.
</Collapse>
`"authenticator" : "sso_login_url"`
In case you authenticate with SSO using an external browser popup, then add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows.
`"authenticator" : "externalbrowser"`
</details>
**Datalake using GCS**
<Collapse title="Datalake using GCS">
<Image src="/images/openmetadata/connectors/datalake/service-connection-using-gcs.png" alt="service-connection-using-gcs"/>
<details>
<summary>Connection Options for GCS</summary>
**BUCKET NAME**
This is the Bucket Name in GCS.
@ -120,7 +66,7 @@ This is the Bucket Name in GCS.
**GCS Credentials**
We support two ways of authenticating to BigQuery:
We support two ways of authenticating to GCS:
1. Passing the raw credential values provided by BigQuery. This requires us to provide the following information, all provided by BigQuery:
1. Credentials type, e.g. `service_account`.
@ -134,126 +80,7 @@ We support two ways of authenticating to BigQuery:
9. Authentication Provider X509 Certificate URL, [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
10. Client X509 Certificate URL
</details>
After hitting Save you will see that your Datalake connector has been added successfully, and you can add an ingestion.
<Image src="/images/openmetadata/connectors/datalake/created-service.png" alt="created-service"/>
### 6. Configure the Metadata Ingestion
Once the service is created, we can add a **Metadata Ingestion Workflow**, either directly from the _Add Ingestion_ button in the figure above, or from the Service page:
<Image src="/images/openmetadata/connectors/datalake/service-page.png" alt="service-page"/>
<details>
<summary>Metadata Ingestion Options</summary>
**Include (Table Filter Pattern)**
Use to table filter patterns to control whether or not to include tables as part of metadata ingestion and data profiling.
Explicitly include tables by adding a list of comma-separated regular expressions to the _Include_ field. OpenMetadata will include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See the figure above for an example.
**Exclude (Table Filter Pattern)**
Explicitly exclude tables by adding a list of comma-separated regular expressions to the _Exclude_ field. OpenMetadata will exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See the figure above for an example.
**Include (Schema Filter Pattern)**
Use to schema filter patterns to control whether or not to include schemas as part of metadata ingestion and data profiling.
Explicitly include schemas by adding a list of comma-separated regular expressions to the _Include_ field. OpenMetadata will include all schemas with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
**Exclude (Schema Filter Pattern)**
Explicitly exclude schemas by adding a list of comma-separated regular expressions to the _Exclude_ field. OpenMetadata will exclude all schemas with names matching one or more of the supplied regular expressions. All other schemas will be included.
**Include views (toggle)**
Set the _Include views_ toggle to the on position to control whether or not to include views as part of metadata ingestion and data profiling.
Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
**Enable data profiler (toggle)**
Glue does not provide querying capabilities, so the data profiler is not supported.
**Ingest sample data (toggle)**
Glue does not provide querying capabilities, so sample data is not supported.
</details>
<Image src="/images/openmetadata/connectors/datalake/deployed-service.png" alt="deploy-service"/>
### 7. Schedule the Ingestion and Deploy
Scheduling can be set up at an hourly, daily, or weekly cadence. The timezone is in UTC. Select a Start Date to schedule for ingestion. It is optional to add an End Date.
Review your configuration settings. If they match what you intended, click _Deploy_ to create the service and schedule metadata ingestion.
If something doesn't look right, click the _Back_ button to return to the appropriate step and change the settings as needed.
<details>
<summary><strong>Scheduling Options</strong></summary>
**Every**
Use the _Every_ drop down menu to select the interval at which you want to ingest metadata. Your options are as follows:
* _Hour_: Ingest metadata once per hour
* _Day_: Ingest metadata once per day
* _Week_: Ingest metadata once per week
**Day**
The _Day_ selector is only active when ingesting metadata once per week. Use the _Day_ selector to set the day of the week on which to ingest metadata.
**Minute**
The _Minute_ dropdown is only active when ingesting metadata once per hour. Use the _Minute_ drop down menu to select the minute of the hour at which to begin ingesting metadata.
**Time**
The _Time_ drop down menus are active when ingesting metadata either once per day or once per week. Use the time drop downs to select the time of day at which to begin ingesting metadata.
**Start date (UTC)**
Use the _Start date_ selector to choose the date at which to begin ingesting metadata according to the defined schedule.
**End date (UTC)**
Use the _End date_ selector to choose the date at which to stop ingesting metadata according to the defined schedule. If no end date is set, metadata ingestion will continue according to the defined schedule indefinitely.
</details>
After configuring the workflow, you can click on _Deploy_ to create the pipeline.
<Image src="/images/openmetadata/connectors/datalake/schedule-options.png" alt="schedule-options"/>
### 8. View the Ingestion Pipeline
Once the workflow has been successfully deployed, you can view the Ingestion Pipeline running from the Service Page.
<Image src="/images/openmetadata/connectors/datalake/ingestion-pipeline.png" alt="ingestion-pipeline"/>
### 9. Workflow Deployment Error
If there were any errors during the workflow deployment process, the Ingestion Pipeline Entity will still be created, but no workflow will be present in the Ingestion container.
You can then edit the Ingestion Pipeline and _Deploy_ it again.
<Image src="/images/openmetadata/connectors/datalake/workflow-deployment-error.png" alt="create-account"/>
From the _Connection_ tab, you can also _Edit_ the Service if needed.
</Collapse>
<IngestionScheduleAndDeploy />

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/db2/airflow
<Requirements />
<PythonMod connector="DB2" module="db2" />
<MetadataIngestionServiceDev service="database" connector="DB2" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/db2/cli
<Requirements />
<PythonMod connector="DB2" module="db2" />
<MetadataIngestionServiceDev service="database" connector="DB2" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/deltalake/airflow
<Requirements />
<PythonMod connector="DeltaLake" module="deltalake" />
<MetadataIngestionServiceDev service="database" connector="DeltaLake" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/deltalake/cli
<Requirements />
<PythonMod connector="DeltaLake" module="deltalake" />
<MetadataIngestionServiceDev service="database" connector="DeltaLake" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/druid/airflow
<Requirements />
<PythonMod connector="Druid" module="druid" />
<MetadataIngestionServiceDev service="database" connector="Druid" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/druid/cli
<Requirements />
<PythonMod connector="Druid" module="druid" />
<MetadataIngestionServiceDev service="database" connector="Druid" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/dynamodb/airflow
<Requirements />
<PythonMod connector="DynamoDB" module="dynamodb" />
<MetadataIngestionServiceDev service="database" connector="DynamoDB" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/dynamodb/cli
<Requirements />
<PythonMod connector="Druid" module="druid" />
<MetadataIngestionServiceDev service="database" connector="BigQuery" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/glue/airflow
<Requirements />
<PythonMod connector="Glue" module="glue" />
<MetadataIngestionServiceDev service="database" connector="Glue" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/glue/cli
<Requirements />
<PythonMod connector="Druid" module="druid" />
<MetadataIngestionServiceDev service="database" connector="Glue" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/hive/airflow
<Requirements />
<PythonMod connector="Hive" module="hive" />
<MetadataIngestionServiceDev service="database" connector="Hive" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/hive/cli
<Requirements />
<PythonMod connector="Hive" module="hive" />
<MetadataIngestionServiceDev service="database" connector="Hive" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mariadb/airflow
<Requirements />
<PythonMod connector="MariaDB" module="mariadb" />
<MetadataIngestionServiceDev service="database" connector="MariaDB" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mariadb/cli
<Requirements />
<PythonMod connector="MariaDB" module="mariadb" />
<MetadataIngestionServiceDev service="database" connector="MariaDB" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mssql/airflow
<Requirements />
<PythonMod connector="MSSQL" module="mssql" />
<MetadataIngestionServiceDev service="database" connector="MSSQL" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mssql/cli
<Requirements />
<PythonMod connector="MSSQL" module="mssql" />
<MetadataIngestionServiceDev service="database" connector="MSSQL" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,10 @@ slug: /openmetadata/connectors/database/mysql/airflow
<Requirements />
<PythonMod connector="MySQL" module="mysql" />
Note that the user should have access to the `INFORMATION_SCHEMA` table.
<MetadataIngestionServiceDev service="database" connector="MySQL" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,10 @@ slug: /openmetadata/connectors/database/mysql/cli
<Requirements />
<PythonMod connector="MySQL" module="mysql" />
Note that the user should have access to the `INFORMATION_SCHEMA` table.
<MetadataIngestionServiceDev service="database" connector="MySQL" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mysql
<Requirements />
Note that the user should have access to the `INFORMATION_SCHEMA` table.
<MetadataIngestionService connector="MySQL"/>
<h4>Connection Options</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/oracle/airflow
<Requirements />
<PythonMod connector="Oracle" module="oracle" />
<MetadataIngestionServiceDev service="database" connector="Oracle" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/oracle/cli
<Requirements />
<PythonMod connector="Oracle" module="oracle" />
<MetadataIngestionServiceDev service="database" connector="Oracle" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/postgres/airflow
<Requirements />
<PythonMod connector="Postgres" module="postgres" />
<MetadataIngestionServiceDev service="database" connector="Postgres" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/postgres/cli
<Requirements />
<PythonMod connector="Postgres" module="postgres" />
<MetadataIngestionServiceDev service="database" connector="Postgres" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/presto/airflow
<Requirements />
<PythonMod connector="Presto" module="presto" />
<MetadataIngestionServiceDev service="database" connector="Presto" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/presto/cli
<Requirements />
<PythonMod connector="Presto" module="presto" />
<MetadataIngestionServiceDev service="database" connector="Presto" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/redshift/airflow
<Requirements />
<PythonMod connector="Redshift" module="redshift" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[redshift-usage]"
```
<MetadataIngestionServiceDev service="database" connector="Redshift" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/redshift/cli
<Requirements />
<PythonMod connector="Redshift" module="redshift" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[redshift-usage]"
```
<MetadataIngestionServiceDev service="database" connector="Redshift" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/salesforce/airflow
<Requirements />
<PythonMod connector="Salesforce" module="salesforce" />
<MetadataIngestionServiceDev service="database" connector="Salesforce" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/salesforce/cli
<Requirements />
<PythonMod connector="Salesforce" module="salesforce" />
<MetadataIngestionServiceDev service="database" connector="Salesforce" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/singlestore/airflow
<Requirements />
<PythonMod connector="SingleStore" module="singlestore" />
<MetadataIngestionServiceDev service="database" connector="SingleStore" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/singlestore/cli
<Requirements />
<PythonMod connector="SingleStore" module="singlestore" />
<MetadataIngestionServiceDev service="database" connector="SingleStore" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/snowflake/airflow
<Requirements />
<PythonMod connector="Snowflake" module="snowflake" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[snowflake-usage]"
```
<MetadataIngestionServiceDev service="database" connector="Snowflake" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/snowflake/cli
<Requirements />
<PythonMod connector="Snowflake" module="snowflake" />
If you want to run the Usage Connector, you'll also need to install:
```bash
pip3 install "openmetadata-ingestion[snowflake-usage]"
```
<MetadataIngestionServiceDev service="database" connector="Snowflake" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/trino/airflow
<Requirements />
<PythonMod connector="Trino" module="trino" />
<MetadataIngestionServiceDev service="database" connector="Trino" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/trino/cli
<Requirements />
<PythonMod connector="Trino" module="trino" />
<MetadataIngestionServiceDev service="database" connector="Trino" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/vertica/airflow
<Requirements />
<PythonMod connector="Vertica" module="vertica" />
<MetadataIngestionServiceDev service="database" connector="Vertica" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/vertica/cli
<Requirements />
<PythonMod connector="Vertica" module="vertica" />
<MetadataIngestionServiceDev service="database" connector="Vertica" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/messaging/kafka/airflow
<Requirements />
<PythonMod connector="Kafka" module="kafka" />
<MetadataIngestionServiceDev service="messaging" connector="Kafka" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/messaging/kafka/cli
<Requirements />
<PythonMod connector="Kafka" module="kafka" />
<MetadataIngestionServiceDev service="messaging" connector="Kafka" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -9,13 +9,7 @@ In this page, you will learn how to use the `metadata` CLI to run a one-ingestio
<Requirements />
## Python requirements
To run the Amundsen ingestion, you will need to install:
```commandline
pip3 install "openmetadata-ingestion[amundsen]"
```
<PythonMod connector="Amundsen" module="amundsen" />
Make sure you are running openmetadata-ingestion version 0.10.2 or above.

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/airbyte/airflow
<Requirements />
<PythonMod connector="Airbyte" module="airbyte" />
<MetadataIngestionServiceDev service="pipeline" connector="Airbyte" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/airbyte/cli
<Requirements />
<PythonMod connector="Airbyte" module="airbyte" />
<MetadataIngestionServiceDev service="pipeline" connector="Airbyte" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,13 @@ slug: /openmetadata/connectors/pipeline/airflow/cli
<Requirements />
<PythonMod connector="Airflow" module="airflow" />
Note that this installs the same Airflow version that we ship in the Ingestion Container, which is
Airflow `2.3.3` from Release `0.12`.
The ingestion using Airflow version 2.3.3 as a source package has been tested against Airflow 2.3.3 and Airflow 2.2.5.
<MetadataIngestionServiceDev service="pipeline" connector="Airflow" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -3,7 +3,36 @@ title: Airflow
slug: /openmetadata/connectors/pipeline/airflow
---
<ConnectorIntro service="pipeline" connector="Airflow"/>
# Airflow
In this section, we provide guides and references to use the Airflow connector.
Configure and schedule Airflow metadata workflow from the OpenMetadata UI:
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check the following docs to
extract metadata directly from your Airflow instance or via the CLI:
<TileContainer>
<Tile
icon="air"
title="Ingest directly from your Airflow"
text="Configure the ingestion with a DAG on your own Airflow instance"
link={
"/openmetadata/connectors/pipeline/airflow/gcs"
}
size="half"
/>
<Tile
icon="account_tree"
title="Ingest with the CLI"
text="Run a one-time ingestion using the metadata CLI"
link={
"/openmetadata/connectors/pipeline/airflow/cli"
}
size="half"
/>
</TileContainer>
<Requirements />

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/fivetran/airflow
<Requirements />
<PythonMod connector="Fivetran" module="fivetran" />
<MetadataIngestionServiceDev service="pipeline" connector="Fivetran" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/fivetran/cli
<Requirements />
<PythonMod connector="Fivetran" module="fivetran" />
<MetadataIngestionServiceDev service="pipeline" connector="Fivetran" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/glue/airflow
<Requirements />
<PythonMod connector="Glue" module="glue" />
<MetadataIngestionServiceDev service="pipeline" connector="GluePipeline" goal="Airflow"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/glue/cli
<Requirements />
<PythonMod connector="Glue" module="glue" />
<MetadataIngestionServiceDev service="pipeline" connector="GluePipeline" goal="CLI"/>
<h4>Source Configuration - Service Connection</h4>

View File

@ -101,30 +101,142 @@ Follow the instructions [here](https://docs.docker.com/compose/cli-command/#inst
</Collapse>
## Get the latest release and run
## Procedure
From your terminal:
### 1. Create a directory for OpenMetadata
```commandline
mkdir openmetadata && cd "$_"
wget https://github.com/open-metadata/OpenMetadata/releases/download/0.11.3-release/docker-compose.yml
docker compose up -d
Create a new directory for OpenMetadata and navigate into that directory.
```bash
mkdir openmetadata-docker && cd openmetadata-docker
```
This will start all the necessary components locally. You can validate that all containers are up
and running with `docker ps`.
### 2. Create a Python virtual environment
```commandline
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
470cc8149826 openmetadata/server:0.11.0 "./openmetadata-star…" 45 seconds ago Up 43 seconds 3306/tcp, 9200/tcp, 9300/tcp, 0.0.0.0:8585-8586->8585-8586/tcp openmetadata_server
63578aacbff5 openmetadata/ingestion:0.11.0 "./ingestion_depende…" 45 seconds ago Up 43 seconds 0.0.0.0:8080->8080/tcp openmetadata_ingestion
9f5ee8334f4b docker.elastic.co/elasticsearch/elasticsearch:7.10.2 "/tini -- /usr/local…" 45 seconds ago Up 44 seconds 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp openmetadata_elasticsearch
08947ab3424b openmetadata/db:0.11.0 "/entrypoint.sh mysq…" 45 seconds ago Up 44 seconds (healthy) 3306/tcp, 33060-33061/tcp openmetadata_mysql
Create a virtual environment to avoid conflicts with other Python environments on your host system.
A virtual environment is a self-contained directory tree that contains a Python installation for a particular version
of Python, plus a number of additional packages.
In a later step you will install the `openmetadata-ingestion` Python module and its dependencies in this virtual environment.
```bash
python3 -m venv env
```
In a few seconds, you should be able to access the OpenMetadata UI at [http://localhost:8585](http://localhost:8585):
### 3. Activate the virtual environment
```bash
source env/bin/activate
```
### 4. Upgrade pip and setuptools
```bash
pip3 install --upgrade pip setuptools
```
### 5. Install the OpenMetadata Python module using pip
```bash
pip3 install --upgrade "openmetadata-ingestion[docker]"
```
### 6. Ensure the module is installed and ready for use
```bash
metadata docker --help
```
After running the command above, you should see output similar to the following.
```
metadata docker --help
Usage: metadata docker [OPTIONS]
Checks Docker Memory Allocation Run Latest Release Docker - metadata docker
--start Run Local Docker - metadata docker --start -f path/to/docker-
compose.yml
Options:
--start Start release docker containers
--stop Stops openmetadata docker containers
--pause Pause openmetadata docker containers
--resume Resume/Unpause openmetadata docker
containers
--clean Stops and remove openmetadata docker
containers along with images, volumes,
networks associated
-f, --file-path FILE Path to Local docker-compose.yml
-env-file, --env-file-path FILE
Path to env file containing the environment
variables
--reset-db Reset OpenMetadata Data
--ingest-sample-data Enable the sample metadata ingestion
--help Show this message and exit.
```
### 7. Start the OpenMetadata Docker containers
```bash
metadata docker --start
```
This will create a docker network and four containers for the following services:
- MySQL to store the metadata catalog
- Elasticsearch to maintain the metadata index which enables you to search the catalog
- Apache Airflow which OpenMetadata uses for metadata ingestion
- The OpenMetadata UI and API server
After starting the Docker containers, you should see an output similar to the following.
```
[2021-11-18 15:53:52,532] INFO {metadata.cmd:202} - Running Latest Release Docker
[+] Running 5/5
⠿ Network tmp_app_net Created 0.3s
⠿ Container tmp_mysql_1 Started 1.0s
⠿ Container tmp_elasticsearch_1 Started 1.0s
⠿ Container tmp_ingestion_1 Started 2.1s
⠿ Container tmp_openmetadata-server_1 Started 2.2s
[2021-11-18 15:53:55,876] INFO {metadata.cmd:212} - Time took to get containers running: 0:00:03.124889
.......
```
After starting the containers, `metadata` will launch Airflow tasks to ingest sample metadata and usage data for you to
experiment with. This might take several minutes, depending on your system.
<Note>
- `metadata docker --stop` will stop the Docker containers.
- `metadata docker --clean` will clean/prune the containers, volumes, and networks.
</Note>
### 8. Wait for metadata ingestion to finish
˚
Once metadata ingestion has finished and the OpenMetadata UI is ready for use, you will see output similar to the following.
```
✅ OpenMetadata is up and running
Open http://localhost:8585 in your browser to access OpenMetadata..
To checkout Ingestion via Airflow, go to http://localhost:8080
(username: admin, password: admin)
We are available on Slack , https://slack.open-metadata.org/ . Reach out to us if you have any questions.
If you like what we are doing, please consider giving us a star on github at https://github.com/open-metadata/OpenMetadata.
It helps OpenMetadata reach wider audience and helps our community.
```
<Tip>
The `metadata` CLI is very useful for quickly testing when getting started or wanting to try out a new release.
If you had already set up a release and are trying to test a new one, you might need to run `metadata docker --clean`
to clean up the whole environment and pick up the new ingredients from a fresh start.
</Tip>
<Image src="/images/quickstart/docker/openmetadata.png" alt="UI"/>
@ -158,3 +270,60 @@ If you want to persist your data, prepare [Named Volumes](/deployment/docker/vol
2. Visit the [Connectors](/openmetadata/connectors) documentation to see what services you can integrate with
OpenMetadata.
3. Visit the [API](/swagger.html) documentation and explore the rich set of OpenMetadata APIs.
## Troubleshooting
### Compose is not a docker command
If you are getting an error such as `"compose" is not a docker command`, you might need to revisit the
installation steps above to make sure that Docker Compose is properly added to your system.
### metadata CLI issues
Are you having trouble starting the containers with the `metadata` CLI? While that process is recommended,
you can always run `docker compose` manually after picking up the latest `docker-compose.yml` file from the release:
```commandline
mkdir openmetadata && cd "$_"
wget https://github.com/open-metadata/OpenMetadata/releases/download/0.11.3-release/docker-compose.yml
docker compose up -d
```
This snippet will create a directory named `openmetadata` and download the `docker-compose.yml` file automatically.
Afterwards, it will start the containers. If instead you want to download the file manually to another location,
you can do so from the Releases [page](https://github.com/open-metadata/OpenMetadata/releases).
This will start all the necessary components locally. You can validate that all containers are up
and running with `docker ps`.
```commandline
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
470cc8149826 openmetadata/server:0.11.0 "./openmetadata-star…" 45 seconds ago Up 43 seconds 3306/tcp, 9200/tcp, 9300/tcp, 0.0.0.0:8585-8586->8585-8586/tcp openmetadata_server
63578aacbff5 openmetadata/ingestion:0.11.0 "./ingestion_depende…" 45 seconds ago Up 43 seconds 0.0.0.0:8080->8080/tcp openmetadata_ingestion
9f5ee8334f4b docker.elastic.co/elasticsearch/elasticsearch:7.10.2 "/tini -- /usr/local…" 45 seconds ago Up 44 seconds 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp openmetadata_elasticsearch
08947ab3424b openmetadata/db:0.11.0 "/entrypoint.sh mysq…" 45 seconds ago Up 44 seconds (healthy) 3306/tcp, 33060-33061/tcp openmetadata_mysql
```
In a few seconds, you should be able to access the OpenMetadata UI at [http://localhost:8585](http://localhost:8585):
### Network openmetadata_app_net Error
You might see something like:
```
The docker command executed was `/usr/local/bin/docker compose --file /var/folders/bl/rm5dhdf127ngm4rr40hvhbq40000gn/T/docker-compose.yml --project-name openmetadata up --detach`.
It returned with code 1
The content of stdout can be found above the stacktrace (it wasn't captured).
The content of stderr is 'Network openmetadata_app_net Creating
Network openmetadata_app_net Error
failed to create network openmetadata_app_net: Error response from daemon: Pool overlaps with other one on this address space
```
A common solution is to run `docker network prune`:
```
WARNING! This will remove all custom networks not used by at least one container.
```
So be careful if you want to keep up some (unused) networks from your laptop.

Binary file not shown.

After

Width:  |  Height:  |  Size: 453 KiB