mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-20 06:58:18 +00:00
Docs - Python requirements & metadata docker (#6790)
Docs - Python requirements & metadata docker (#6790)
This commit is contained in:
parent
89ec1f9c6d
commit
15e1bb531a
@ -15,9 +15,26 @@ for data persistence. Learn how to do so [here](/deployment/docker/volumes).
|
||||
To test out your security integration, check out how to
|
||||
[Enable Security](/deployment/docker/security).
|
||||
|
||||
## Changing ports
|
||||
|
||||
This docker deployment is powered by `docker compose`, and uses the `docker-compose.yml` files shipped during
|
||||
each release [example](https://github.com/open-metadata/OpenMetadata/releases/tag/0.11.4-release).
|
||||
|
||||
As with the [Named Volumes](/deployment/docker/volumes), you might want to tune a bit the compose file to modify
|
||||
the default ports.
|
||||
|
||||
We are shipping the OpenMetadata server and UI at `8585`, and the ingestion container (Airflow) at `8080`. You can
|
||||
take a look at the official Docker [docs](https://docs.docker.com/compose/compose-file/#ports). As an example, You could
|
||||
update the ports to serve Airflow at `1234` with:
|
||||
|
||||
```yaml
|
||||
ports:
|
||||
- "1234:8080"
|
||||
```
|
||||
|
||||
# Production Deployment
|
||||
|
||||
If instead, you are planning on going to PROD, we recommend the following
|
||||
If you are planning on going to PROD, we also recommend taking a look at the following
|
||||
deployment strategies:
|
||||
|
||||
<InlineCalloutContainer>
|
||||
|
@ -9,7 +9,6 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
|
||||
|
||||
## Global Chart Values
|
||||
|
||||
<Table>
|
||||
|
||||
| Key | Type | Default |
|
||||
| :---------- | :---------- | :---------- |
|
||||
@ -75,7 +74,6 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
|
||||
| global.elasticsearch.trustStore.path | string | `Empty String` |
|
||||
| global.elasticsearch.trustStore.password.secretRef | string | `elasticsearch-truststore-secrets` |
|
||||
| global.elasticsearch.trustStore.password.secretKey | string | `openmetadata-elasticsearch-truststore-password` |
|
||||
| global.fernetKey | string | `jJ/9sz0g0OHxsfxOoSfdFdmk3ysNmPRnH3TUAbz3IHA=` |
|
||||
| global.jwtTokenConfiguration.enabled | bool | `false` |
|
||||
| global.jwtTokenConfiguration.rsapublicKeyFilePath | string | `Empty String` |
|
||||
| global.jwtTokenConfiguration.rsaprivateKeyFilePath | string | `Empty String` |
|
||||
@ -86,11 +84,9 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
|
||||
| global.openmetadata.host | string | `openmetadata` |
|
||||
| global.openmetadata.port | int | 8585 |
|
||||
|
||||
</Table>
|
||||
|
||||
## Chart Values
|
||||
|
||||
<Table>
|
||||
|
||||
| Key | Type | Default |
|
||||
| :---------- | :---------- | :---------- |
|
||||
@ -129,5 +125,3 @@ This page list all the supported helm values for OpenMetadata Helm Charts.
|
||||
| serviceAccount.name | string | `nil` |
|
||||
| sidecars | list | `[]` |
|
||||
| tolerations | list | `[]` |
|
||||
|
||||
</Table>
|
@ -50,6 +50,8 @@ site_menu:
|
||||
url: /deployment/kubernetes/onprem
|
||||
- category: Deployment / Kubernetes Deployment / Enable Security
|
||||
url: /deployment/kubernetes/security
|
||||
- category: Deployment / Kubernetes Deployment / Helm Values
|
||||
url: /deployment/kubernetes/helm-values
|
||||
|
||||
- category: Deployment / Enable Security
|
||||
url: /deployment/security
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/looker/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Looker" module="looker" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Looker" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/looker/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Looker" module="looker" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Looker" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/metabase/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Metabase" module="metabase" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Metabase" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/metabase/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Metabase" module="metabase" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Metabase" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/powerbi/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="PowerBI" module="powerbi" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="PowerBI" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/powerbi/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="PowerBI" module="powerbi" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="PowerBI" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/redash/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Redash" module="redash" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Redash" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/redash/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Redash" module="redash" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Redash" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/superset/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Superset" module="superset" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Superset" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/superset/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Superset" module="superset" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Superset" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/tableau/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Tableau" module="tableau" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Tableau" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/dashboard/tableau/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Tableau" module="tableau" />
|
||||
|
||||
<MetadataIngestionServiceDev service="dashboard" connector="Tableau" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/athena/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Athena" module="athena" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Athena" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/athena/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Athena" module="athena" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="BigQuery" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/azuresql/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="AzureSQL" module="azuresql" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="AzureSQL" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/azuresql/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="AzureSQL" module="azuresql" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="AzureSQL" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/bigquery/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="BigQuery" module="bigquery" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[bigquery-usage]"
|
||||
```
|
||||
|
||||
<h4>GCP Permissions</h4>
|
||||
|
||||
<p> To execute metadata extraction and usage workflow successfully the user or the service account should have enough access to fetch required data. Following table describes the minimum required permissions </p>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/bigquery/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="BigQuery" module="bigquery" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[bigquery-usage]"
|
||||
```
|
||||
|
||||
<h4>GCP Permissions</h4>
|
||||
|
||||
<p> To execute metadata extraction and usage workflow successfully the user or the service account should have enough access to fetch required data. Following table describes the minimum required permissions </p>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/clickhouse/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="ClickHouse" module="clickhouse" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Clickhouse" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/clickhouse/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="ClickHouse" module="clickhouse" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Clickhouse" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/databricks/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Databricks" module="databricks" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Databricks" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/databricks/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Databricks" module="databricks" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Databricks" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/datalake/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DataLake" module="datalake" />
|
||||
|
||||
## Metadata Ingestion
|
||||
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.
|
||||
|
||||
@ -59,8 +61,6 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
|
||||
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
|
||||
|
||||
|
||||
|
||||
|
||||
This is a sample config for Datalake using GCS:
|
||||
|
||||
```yaml
|
||||
@ -103,13 +103,13 @@ workflowConfig:
|
||||
```
|
||||
|
||||
|
||||
<h4>Source Configuration - Service Connection using GCS</h4>
|
||||
#### Source Configuration - Service Connection using GCS
|
||||
|
||||
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
|
||||
|
||||
* **type**: Credentials type, e.g. `service_account`.
|
||||
* **projectId**
|
||||
* **privat**eKe**y**
|
||||
* **privateKey**
|
||||
* **privateKeyId**
|
||||
* **clientEmail**
|
||||
* **clientId**
|
||||
@ -117,8 +117,8 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
|
||||
* **tokenUri**: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default
|
||||
* **authProviderX509CertUrl**: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
|
||||
* **clientX509CertUrl**
|
||||
* **bucketName :** name of the bucket in GCS
|
||||
* **Prefix** : prefix in gcs bucket
|
||||
* **bucketName**: name of the bucket in GCS
|
||||
* **Prefix**: prefix in gcs bucket
|
||||
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
|
||||
|
||||
<MetadataIngestionConfig service="database" connector="Datalake" goal="Airflow" />
|
||||
|
@ -7,6 +7,7 @@ slug: /openmetadata/connectors/database/datalake/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DataLake" module="datalake" />
|
||||
|
||||
## Metadata Ingestion
|
||||
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Datalake.
|
||||
@ -45,11 +46,9 @@ workflowConfig:
|
||||
openMetadataServerConfig:
|
||||
hostPort: http://localhost:8585/api
|
||||
authProvider: no-auth
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
#### Source Configuration - Source Config using AWS S3
|
||||
|
||||
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
|
||||
@ -59,9 +58,6 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
|
||||
* **awsRegion**: Specify the region in which your DynamoDB is located. This setting is required even if you have configured a local AWS profile.
|
||||
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
|
||||
|
||||
|
||||
|
||||
|
||||
This is a sample config for Datalake using GCS:
|
||||
|
||||
```yaml
|
||||
@ -104,13 +100,13 @@ workflowConfig:
|
||||
```
|
||||
|
||||
|
||||
<h4>Source Configuration - Service Connection using GCS</h4>
|
||||
#### Source Configuration - Service Connection using GCS
|
||||
|
||||
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json).
|
||||
|
||||
* **type**: Credentials type, e.g. `service_account`.
|
||||
* **projectId**
|
||||
* **privat**eKe**y**
|
||||
* **privateKey**
|
||||
* **privateKeyId**
|
||||
* **clientEmail**
|
||||
* **clientId**
|
||||
@ -118,8 +114,8 @@ The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetada
|
||||
* **tokenUri**: [https://oauth2.googleapis.com/token](https://oauth2.googleapis.com/token) by default
|
||||
* **authProviderX509CertUrl**: [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
|
||||
* **clientX509CertUrl**
|
||||
* **bucketName :** name of the bucket in GCS
|
||||
* **Prefix** : prefix in gcs bucket
|
||||
* **bucketName**: name of the bucket in GCS
|
||||
* **Prefix**: prefix in gcs bucket
|
||||
* **schemaFilterPattern** and **tableFilternPattern**: Note that the `schemaFilterPattern` and `tableFilterPattern` both support regex as `include` or `exclude`. E.g.,
|
||||
|
||||
<MetadataIngestionConfig service="database" connector="Datalake" goal="CLI" />
|
||||
|
@ -3,59 +3,18 @@ title: Datalake
|
||||
slug: /openmetadata/connectors/database/datalake
|
||||
---
|
||||
|
||||
<ConnectorIntro connector="Datalake" goal="Airflow" />
|
||||
<ConnectorIntro connector="Datalake" />
|
||||
|
||||
<Requirements />
|
||||
|
||||
<MetadataIngestionService connector="Datalake"/>
|
||||
|
||||
## Metadata Ingestion
|
||||
<h4>Connection Options</h4>
|
||||
|
||||
### 1. Visit the Services Page
|
||||
|
||||
|
||||
The first step is ingesting the metadata from your sources. Under Settings you will find a **Services** link an external source system to OpenMetadata. Once a service is created, it can be used to configure metadata, usage, and profiler workflows.
|
||||
|
||||
To visit the _Services_ page, select _Services_ from the _Settings_ menu.
|
||||
|
||||
|
||||
.png>)
|
||||
|
||||
|
||||
|
||||
### 2. Create a New Service
|
||||
|
||||
Click on the _Add New Service_ button to start the Service creation.
|
||||
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/create-new-service.png" alt="create-new-service"/>
|
||||
|
||||
### 3. Select the Service Type
|
||||
|
||||
Select Datalake as the service type and click _Next_.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/select-service.png" alt="select-service"/>
|
||||
|
||||
### 4. Name and Describe your Service
|
||||
|
||||
Provide a name and description for your service as illustrated below.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/describe-service.png" alt="describe-service"/>
|
||||
|
||||
#### Service Name
|
||||
|
||||
OpenMetadata uniquely identifies services by their _Service Name_. Provide a name that distinguishes your deployment from other services, including the other Datalake services that you might be ingesting metadata from.
|
||||
|
||||
### 5. Configure the Service Connection
|
||||
|
||||
In this step, we will configure the connection settings required for this connector. Please follow the instructions below to ensure that you've configured the connector to read from your Datalake service as desired.
|
||||
|
||||
**Datalake using AWS S3**
|
||||
<Collapse title="Datalake using AWS S3">
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/service-connection-using-aws-s3.png" alt="create-account"/>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>Connection Options for AWS S3</summary>
|
||||
|
||||
**AWS Access Key ID**
|
||||
|
||||
@ -91,25 +50,12 @@ Enter the details for any additional connection options that can be sent to Dyna
|
||||
|
||||
Enter the details for any additional connection arguments such as security or protocol configs that can be sent to DynamoDB during the connection. These details must be added as Key-Value pairs.
|
||||
|
||||
In case you are using Single-Sign-On (SSO) for authentication, add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows.
|
||||
</Collapse>
|
||||
|
||||
`"authenticator" : "sso_login_url"`
|
||||
|
||||
In case you authenticate with SSO using an external browser popup, then add the `authenticator` details in the Connection Arguments as a Key-Value pair as follows.
|
||||
|
||||
`"authenticator" : "externalbrowser"`
|
||||
|
||||
</details>
|
||||
|
||||
**Datalake using GCS**
|
||||
<Collapse title="Datalake using GCS">
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/service-connection-using-gcs.png" alt="service-connection-using-gcs"/>
|
||||
|
||||
|
||||
<details>
|
||||
|
||||
<summary>Connection Options for GCS</summary>
|
||||
|
||||
**BUCKET NAME**
|
||||
|
||||
This is the Bucket Name in GCS.
|
||||
@ -120,7 +66,7 @@ This is the Bucket Name in GCS.
|
||||
|
||||
**GCS Credentials**
|
||||
|
||||
We support two ways of authenticating to BigQuery:
|
||||
We support two ways of authenticating to GCS:
|
||||
|
||||
1. Passing the raw credential values provided by BigQuery. This requires us to provide the following information, all provided by BigQuery:
|
||||
1. Credentials type, e.g. `service_account`.
|
||||
@ -134,126 +80,7 @@ We support two ways of authenticating to BigQuery:
|
||||
9. Authentication Provider X509 Certificate URL, [https://www.googleapis.com/oauth2/v1/certs](https://www.googleapis.com/oauth2/v1/certs) by default
|
||||
10. Client X509 Certificate URL
|
||||
|
||||
</details>
|
||||
|
||||
After hitting Save you will see that your Datalake connector has been added successfully, and you can add an ingestion.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/created-service.png" alt="created-service"/>
|
||||
|
||||
### 6. Configure the Metadata Ingestion
|
||||
|
||||
Once the service is created, we can add a **Metadata Ingestion Workflow**, either directly from the _Add Ingestion_ button in the figure above, or from the Service page:
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/service-page.png" alt="service-page"/>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>Metadata Ingestion Options</summary>
|
||||
|
||||
**Include (Table Filter Pattern)**
|
||||
|
||||
Use to table filter patterns to control whether or not to include tables as part of metadata ingestion and data profiling.
|
||||
|
||||
Explicitly include tables by adding a list of comma-separated regular expressions to the _Include_ field. OpenMetadata will include all tables with names matching one or more of the supplied regular expressions. All other tables will be excluded. See the figure above for an example.
|
||||
|
||||
**Exclude (Table Filter Pattern)**
|
||||
|
||||
Explicitly exclude tables by adding a list of comma-separated regular expressions to the _Exclude_ field. OpenMetadata will exclude all tables with names matching one or more of the supplied regular expressions. All other tables will be included. See the figure above for an example.
|
||||
|
||||
**Include (Schema Filter Pattern)**
|
||||
|
||||
Use to schema filter patterns to control whether or not to include schemas as part of metadata ingestion and data profiling.
|
||||
|
||||
Explicitly include schemas by adding a list of comma-separated regular expressions to the _Include_ field. OpenMetadata will include all schemas with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
|
||||
|
||||
**Exclude (Schema Filter Pattern)**
|
||||
|
||||
Explicitly exclude schemas by adding a list of comma-separated regular expressions to the _Exclude_ field. OpenMetadata will exclude all schemas with names matching one or more of the supplied regular expressions. All other schemas will be included.
|
||||
|
||||
**Include views (toggle)**
|
||||
|
||||
Set the _Include views_ toggle to the on position to control whether or not to include views as part of metadata ingestion and data profiling.
|
||||
|
||||
Explicitly include views by adding the following key-value pair in the `source.config` field of your configuration file.
|
||||
|
||||
**Enable data profiler (toggle)**
|
||||
|
||||
Glue does not provide querying capabilities, so the data profiler is not supported.
|
||||
|
||||
**Ingest sample data (toggle)**
|
||||
|
||||
Glue does not provide querying capabilities, so sample data is not supported.
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/deployed-service.png" alt="deploy-service"/>
|
||||
|
||||
### 7. Schedule the Ingestion and Deploy
|
||||
|
||||
Scheduling can be set up at an hourly, daily, or weekly cadence. The timezone is in UTC. Select a Start Date to schedule for ingestion. It is optional to add an End Date.
|
||||
|
||||
Review your configuration settings. If they match what you intended, click _Deploy_ to create the service and schedule metadata ingestion.
|
||||
|
||||
If something doesn't look right, click the _Back_ button to return to the appropriate step and change the settings as needed.
|
||||
|
||||
<details>
|
||||
|
||||
<summary><strong>Scheduling Options</strong></summary>
|
||||
|
||||
**Every**
|
||||
|
||||
Use the _Every_ drop down menu to select the interval at which you want to ingest metadata. Your options are as follows:
|
||||
|
||||
* _Hour_: Ingest metadata once per hour
|
||||
* _Day_: Ingest metadata once per day
|
||||
* _Week_: Ingest metadata once per week
|
||||
|
||||
**Day**
|
||||
|
||||
The _Day_ selector is only active when ingesting metadata once per week. Use the _Day_ selector to set the day of the week on which to ingest metadata.
|
||||
|
||||
**Minute**
|
||||
|
||||
The _Minute_ dropdown is only active when ingesting metadata once per hour. Use the _Minute_ drop down menu to select the minute of the hour at which to begin ingesting metadata.
|
||||
|
||||
**Time**
|
||||
|
||||
The _Time_ drop down menus are active when ingesting metadata either once per day or once per week. Use the time drop downs to select the time of day at which to begin ingesting metadata.
|
||||
|
||||
**Start date (UTC)**
|
||||
|
||||
Use the _Start date_ selector to choose the date at which to begin ingesting metadata according to the defined schedule.
|
||||
|
||||
**End date (UTC)**
|
||||
|
||||
Use the _End date_ selector to choose the date at which to stop ingesting metadata according to the defined schedule. If no end date is set, metadata ingestion will continue according to the defined schedule indefinitely.
|
||||
|
||||
</details>
|
||||
|
||||
After configuring the workflow, you can click on _Deploy_ to create the pipeline.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/schedule-options.png" alt="schedule-options"/>
|
||||
|
||||
|
||||
|
||||
### 8. View the Ingestion Pipeline
|
||||
|
||||
Once the workflow has been successfully deployed, you can view the Ingestion Pipeline running from the Service Page.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/ingestion-pipeline.png" alt="ingestion-pipeline"/>
|
||||
|
||||
### 9. Workflow Deployment Error
|
||||
|
||||
If there were any errors during the workflow deployment process, the Ingestion Pipeline Entity will still be created, but no workflow will be present in the Ingestion container.
|
||||
|
||||
You can then edit the Ingestion Pipeline and _Deploy_ it again.
|
||||
|
||||
<Image src="/images/openmetadata/connectors/datalake/workflow-deployment-error.png" alt="create-account"/>
|
||||
|
||||
From the _Connection_ tab, you can also _Edit_ the Service if needed.
|
||||
|
||||
|
||||
</Collapse>
|
||||
|
||||
<IngestionScheduleAndDeploy />
|
||||
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/db2/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DB2" module="db2" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="DB2" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/db2/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DB2" module="db2" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="DB2" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/deltalake/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DeltaLake" module="deltalake" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="DeltaLake" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/deltalake/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DeltaLake" module="deltalake" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="DeltaLake" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/druid/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Druid" module="druid" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Druid" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/druid/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Druid" module="druid" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Druid" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/dynamodb/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="DynamoDB" module="dynamodb" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="DynamoDB" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/dynamodb/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Druid" module="druid" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="BigQuery" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/glue/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Glue" module="glue" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Glue" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/glue/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Druid" module="druid" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Glue" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/hive/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Hive" module="hive" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Hive" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/hive/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Hive" module="hive" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Hive" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mariadb/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MariaDB" module="mariadb" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MariaDB" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mariadb/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MariaDB" module="mariadb" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MariaDB" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mssql/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MSSQL" module="mssql" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MSSQL" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mssql/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MSSQL" module="mssql" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MSSQL" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,10 @@ slug: /openmetadata/connectors/database/mysql/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MySQL" module="mysql" />
|
||||
|
||||
Note that the user should have access to the `INFORMATION_SCHEMA` table.
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MySQL" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,10 @@ slug: /openmetadata/connectors/database/mysql/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="MySQL" module="mysql" />
|
||||
|
||||
Note that the user should have access to the `INFORMATION_SCHEMA` table.
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="MySQL" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/mysql
|
||||
|
||||
<Requirements />
|
||||
|
||||
Note that the user should have access to the `INFORMATION_SCHEMA` table.
|
||||
|
||||
<MetadataIngestionService connector="MySQL"/>
|
||||
|
||||
<h4>Connection Options</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/oracle/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Oracle" module="oracle" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Oracle" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/oracle/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Oracle" module="oracle" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Oracle" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/postgres/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Postgres" module="postgres" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Postgres" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/postgres/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Postgres" module="postgres" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Postgres" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/presto/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Presto" module="presto" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Presto" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/presto/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Presto" module="presto" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Presto" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/redshift/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Redshift" module="redshift" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[redshift-usage]"
|
||||
```
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Redshift" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/redshift/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Redshift" module="redshift" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[redshift-usage]"
|
||||
```
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Redshift" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/salesforce/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Salesforce" module="salesforce" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Salesforce" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/salesforce/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Salesforce" module="salesforce" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Salesforce" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/singlestore/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="SingleStore" module="singlestore" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="SingleStore" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/singlestore/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="SingleStore" module="singlestore" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="SingleStore" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/snowflake/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Snowflake" module="snowflake" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[snowflake-usage]"
|
||||
```
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Snowflake" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,14 @@ slug: /openmetadata/connectors/database/snowflake/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Snowflake" module="snowflake" />
|
||||
|
||||
If you want to run the Usage Connector, you'll also need to install:
|
||||
|
||||
```bash
|
||||
pip3 install "openmetadata-ingestion[snowflake-usage]"
|
||||
```
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Snowflake" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/trino/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Trino" module="trino" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Trino" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/trino/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Trino" module="trino" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Trino" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/vertica/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Vertica" module="vertica" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Vertica" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/database/vertica/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Vertica" module="vertica" />
|
||||
|
||||
<MetadataIngestionServiceDev service="database" connector="Vertica" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/messaging/kafka/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Kafka" module="kafka" />
|
||||
|
||||
<MetadataIngestionServiceDev service="messaging" connector="Kafka" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/messaging/kafka/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Kafka" module="kafka" />
|
||||
|
||||
<MetadataIngestionServiceDev service="messaging" connector="Kafka" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -9,13 +9,7 @@ In this page, you will learn how to use the `metadata` CLI to run a one-ingestio
|
||||
|
||||
<Requirements />
|
||||
|
||||
## Python requirements
|
||||
|
||||
To run the Amundsen ingestion, you will need to install:
|
||||
|
||||
```commandline
|
||||
pip3 install "openmetadata-ingestion[amundsen]"
|
||||
```
|
||||
<PythonMod connector="Amundsen" module="amundsen" />
|
||||
|
||||
Make sure you are running openmetadata-ingestion version 0.10.2 or above.
|
||||
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/airbyte/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Airbyte" module="airbyte" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="Airbyte" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/airbyte/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Airbyte" module="airbyte" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="Airbyte" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,13 @@ slug: /openmetadata/connectors/pipeline/airflow/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Airflow" module="airflow" />
|
||||
|
||||
Note that this installs the same Airflow version that we ship in the Ingestion Container, which is
|
||||
Airflow `2.3.3` from Release `0.12`.
|
||||
|
||||
The ingestion using Airflow version 2.3.3 as a source package has been tested against Airflow 2.3.3 and Airflow 2.2.5.
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="Airflow" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -3,7 +3,36 @@ title: Airflow
|
||||
slug: /openmetadata/connectors/pipeline/airflow
|
||||
---
|
||||
|
||||
<ConnectorIntro service="pipeline" connector="Airflow"/>
|
||||
# Airflow
|
||||
|
||||
In this section, we provide guides and references to use the Airflow connector.
|
||||
|
||||
Configure and schedule Airflow metadata workflow from the OpenMetadata UI:
|
||||
|
||||
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check the following docs to
|
||||
extract metadata directly from your Airflow instance or via the CLI:
|
||||
|
||||
<TileContainer>
|
||||
<Tile
|
||||
icon="air"
|
||||
title="Ingest directly from your Airflow"
|
||||
text="Configure the ingestion with a DAG on your own Airflow instance"
|
||||
link={
|
||||
"/openmetadata/connectors/pipeline/airflow/gcs"
|
||||
}
|
||||
size="half"
|
||||
/>
|
||||
<Tile
|
||||
icon="account_tree"
|
||||
title="Ingest with the CLI"
|
||||
text="Run a one-time ingestion using the metadata CLI"
|
||||
link={
|
||||
"/openmetadata/connectors/pipeline/airflow/cli"
|
||||
}
|
||||
size="half"
|
||||
/>
|
||||
</TileContainer>
|
||||
|
||||
|
||||
<Requirements />
|
||||
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/fivetran/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Fivetran" module="fivetran" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="Fivetran" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/fivetran/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Fivetran" module="fivetran" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="Fivetran" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/glue/airflow
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Glue" module="glue" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="GluePipeline" goal="Airflow"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -7,6 +7,8 @@ slug: /openmetadata/connectors/pipeline/glue/cli
|
||||
|
||||
<Requirements />
|
||||
|
||||
<PythonMod connector="Glue" module="glue" />
|
||||
|
||||
<MetadataIngestionServiceDev service="pipeline" connector="GluePipeline" goal="CLI"/>
|
||||
|
||||
<h4>Source Configuration - Service Connection</h4>
|
||||
|
@ -101,30 +101,142 @@ Follow the instructions [here](https://docs.docker.com/compose/cli-command/#inst
|
||||
</Collapse>
|
||||
|
||||
|
||||
## Get the latest release and run
|
||||
## Procedure
|
||||
|
||||
From your terminal:
|
||||
### 1. Create a directory for OpenMetadata
|
||||
|
||||
```commandline
|
||||
mkdir openmetadata && cd "$_"
|
||||
wget https://github.com/open-metadata/OpenMetadata/releases/download/0.11.3-release/docker-compose.yml
|
||||
docker compose up -d
|
||||
Create a new directory for OpenMetadata and navigate into that directory.
|
||||
|
||||
```bash
|
||||
mkdir openmetadata-docker && cd openmetadata-docker
|
||||
```
|
||||
|
||||
This will start all the necessary components locally. You can validate that all containers are up
|
||||
and running with `docker ps`.
|
||||
### 2. Create a Python virtual environment
|
||||
|
||||
```commandline
|
||||
❯ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
470cc8149826 openmetadata/server:0.11.0 "./openmetadata-star…" 45 seconds ago Up 43 seconds 3306/tcp, 9200/tcp, 9300/tcp, 0.0.0.0:8585-8586->8585-8586/tcp openmetadata_server
|
||||
63578aacbff5 openmetadata/ingestion:0.11.0 "./ingestion_depende…" 45 seconds ago Up 43 seconds 0.0.0.0:8080->8080/tcp openmetadata_ingestion
|
||||
9f5ee8334f4b docker.elastic.co/elasticsearch/elasticsearch:7.10.2 "/tini -- /usr/local…" 45 seconds ago Up 44 seconds 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp openmetadata_elasticsearch
|
||||
08947ab3424b openmetadata/db:0.11.0 "/entrypoint.sh mysq…" 45 seconds ago Up 44 seconds (healthy) 3306/tcp, 33060-33061/tcp openmetadata_mysql
|
||||
Create a virtual environment to avoid conflicts with other Python environments on your host system.
|
||||
A virtual environment is a self-contained directory tree that contains a Python installation for a particular version
|
||||
of Python, plus a number of additional packages.
|
||||
|
||||
In a later step you will install the `openmetadata-ingestion` Python module and its dependencies in this virtual environment.
|
||||
|
||||
```bash
|
||||
python3 -m venv env
|
||||
```
|
||||
|
||||
In a few seconds, you should be able to access the OpenMetadata UI at [http://localhost:8585](http://localhost:8585):
|
||||
### 3. Activate the virtual environment
|
||||
|
||||
```bash
|
||||
source env/bin/activate
|
||||
```
|
||||
|
||||
### 4. Upgrade pip and setuptools
|
||||
|
||||
```bash
|
||||
pip3 install --upgrade pip setuptools
|
||||
```
|
||||
|
||||
### 5. Install the OpenMetadata Python module using pip
|
||||
|
||||
```bash
|
||||
pip3 install --upgrade "openmetadata-ingestion[docker]"
|
||||
```
|
||||
|
||||
### 6. Ensure the module is installed and ready for use
|
||||
|
||||
```bash
|
||||
metadata docker --help
|
||||
```
|
||||
|
||||
After running the command above, you should see output similar to the following.
|
||||
|
||||
```
|
||||
❯ metadata docker --help
|
||||
Usage: metadata docker [OPTIONS]
|
||||
|
||||
Checks Docker Memory Allocation Run Latest Release Docker - metadata docker
|
||||
--start Run Local Docker - metadata docker --start -f path/to/docker-
|
||||
compose.yml
|
||||
|
||||
Options:
|
||||
--start Start release docker containers
|
||||
--stop Stops openmetadata docker containers
|
||||
--pause Pause openmetadata docker containers
|
||||
--resume Resume/Unpause openmetadata docker
|
||||
containers
|
||||
--clean Stops and remove openmetadata docker
|
||||
containers along with images, volumes,
|
||||
networks associated
|
||||
-f, --file-path FILE Path to Local docker-compose.yml
|
||||
-env-file, --env-file-path FILE
|
||||
Path to env file containing the environment
|
||||
variables
|
||||
--reset-db Reset OpenMetadata Data
|
||||
--ingest-sample-data Enable the sample metadata ingestion
|
||||
--help Show this message and exit.
|
||||
```
|
||||
|
||||
### 7. Start the OpenMetadata Docker containers
|
||||
|
||||
```bash
|
||||
metadata docker --start
|
||||
```
|
||||
|
||||
This will create a docker network and four containers for the following services:
|
||||
- MySQL to store the metadata catalog
|
||||
- Elasticsearch to maintain the metadata index which enables you to search the catalog
|
||||
- Apache Airflow which OpenMetadata uses for metadata ingestion
|
||||
- The OpenMetadata UI and API server
|
||||
|
||||
After starting the Docker containers, you should see an output similar to the following.
|
||||
|
||||
```
|
||||
[2021-11-18 15:53:52,532] INFO {metadata.cmd:202} - Running Latest Release Docker
|
||||
[+] Running 5/5
|
||||
⠿ Network tmp_app_net Created 0.3s
|
||||
⠿ Container tmp_mysql_1 Started 1.0s
|
||||
⠿ Container tmp_elasticsearch_1 Started 1.0s
|
||||
⠿ Container tmp_ingestion_1 Started 2.1s
|
||||
⠿ Container tmp_openmetadata-server_1 Started 2.2s
|
||||
[2021-11-18 15:53:55,876] INFO {metadata.cmd:212} - Time took to get containers running: 0:00:03.124889
|
||||
.......
|
||||
```
|
||||
|
||||
After starting the containers, `metadata` will launch Airflow tasks to ingest sample metadata and usage data for you to
|
||||
experiment with. This might take several minutes, depending on your system.
|
||||
|
||||
<Note>
|
||||
|
||||
- `metadata docker --stop` will stop the Docker containers.
|
||||
- `metadata docker --clean` will clean/prune the containers, volumes, and networks.
|
||||
|
||||
</Note>
|
||||
|
||||
### 8. Wait for metadata ingestion to finish
|
||||
˚
|
||||
Once metadata ingestion has finished and the OpenMetadata UI is ready for use, you will see output similar to the following.
|
||||
|
||||
```
|
||||
✅ OpenMetadata is up and running
|
||||
|
||||
Open http://localhost:8585 in your browser to access OpenMetadata..
|
||||
|
||||
To checkout Ingestion via Airflow, go to http://localhost:8080
|
||||
(username: admin, password: admin)
|
||||
|
||||
We are available on Slack , https://slack.open-metadata.org/ . Reach out to us if you have any questions.
|
||||
|
||||
If you like what we are doing, please consider giving us a star on github at https://github.com/open-metadata/OpenMetadata.
|
||||
It helps OpenMetadata reach wider audience and helps our community.
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
The `metadata` CLI is very useful for quickly testing when getting started or wanting to try out a new release.
|
||||
|
||||
If you had already set up a release and are trying to test a new one, you might need to run `metadata docker --clean`
|
||||
to clean up the whole environment and pick up the new ingredients from a fresh start.
|
||||
|
||||
</Tip>
|
||||
|
||||
<Image src="/images/quickstart/docker/openmetadata.png" alt="UI"/>
|
||||
|
||||
@ -158,3 +270,60 @@ If you want to persist your data, prepare [Named Volumes](/deployment/docker/vol
|
||||
2. Visit the [Connectors](/openmetadata/connectors) documentation to see what services you can integrate with
|
||||
OpenMetadata.
|
||||
3. Visit the [API](/swagger.html) documentation and explore the rich set of OpenMetadata APIs.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Compose is not a docker command
|
||||
|
||||
If you are getting an error such as `"compose" is not a docker command`, you might need to revisit the
|
||||
installation steps above to make sure that Docker Compose is properly added to your system.
|
||||
|
||||
### metadata CLI issues
|
||||
|
||||
Are you having trouble starting the containers with the `metadata` CLI? While that process is recommended,
|
||||
you can always run `docker compose` manually after picking up the latest `docker-compose.yml` file from the release:
|
||||
|
||||
```commandline
|
||||
mkdir openmetadata && cd "$_"
|
||||
wget https://github.com/open-metadata/OpenMetadata/releases/download/0.11.3-release/docker-compose.yml
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
This snippet will create a directory named `openmetadata` and download the `docker-compose.yml` file automatically.
|
||||
Afterwards, it will start the containers. If instead you want to download the file manually to another location,
|
||||
you can do so from the Releases [page](https://github.com/open-metadata/OpenMetadata/releases).
|
||||
|
||||
This will start all the necessary components locally. You can validate that all containers are up
|
||||
and running with `docker ps`.
|
||||
|
||||
```commandline
|
||||
❯ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
470cc8149826 openmetadata/server:0.11.0 "./openmetadata-star…" 45 seconds ago Up 43 seconds 3306/tcp, 9200/tcp, 9300/tcp, 0.0.0.0:8585-8586->8585-8586/tcp openmetadata_server
|
||||
63578aacbff5 openmetadata/ingestion:0.11.0 "./ingestion_depende…" 45 seconds ago Up 43 seconds 0.0.0.0:8080->8080/tcp openmetadata_ingestion
|
||||
9f5ee8334f4b docker.elastic.co/elasticsearch/elasticsearch:7.10.2 "/tini -- /usr/local…" 45 seconds ago Up 44 seconds 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp openmetadata_elasticsearch
|
||||
08947ab3424b openmetadata/db:0.11.0 "/entrypoint.sh mysq…" 45 seconds ago Up 44 seconds (healthy) 3306/tcp, 33060-33061/tcp openmetadata_mysql
|
||||
```
|
||||
|
||||
In a few seconds, you should be able to access the OpenMetadata UI at [http://localhost:8585](http://localhost:8585):
|
||||
|
||||
### Network openmetadata_app_net Error
|
||||
|
||||
You might see something like:
|
||||
|
||||
```
|
||||
The docker command executed was `/usr/local/bin/docker compose --file /var/folders/bl/rm5dhdf127ngm4rr40hvhbq40000gn/T/docker-compose.yml --project-name openmetadata up --detach`.
|
||||
It returned with code 1
|
||||
The content of stdout can be found above the stacktrace (it wasn't captured).
|
||||
The content of stderr is 'Network openmetadata_app_net Creating
|
||||
Network openmetadata_app_net Error
|
||||
failed to create network openmetadata_app_net: Error response from daemon: Pool overlaps with other one on this address space
|
||||
```
|
||||
|
||||
A common solution is to run `docker network prune`:
|
||||
|
||||
```
|
||||
WARNING! This will remove all custom networks not used by at least one container.
|
||||
```
|
||||
|
||||
So be careful if you want to keep up some (unused) networks from your laptop.
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 453 KiB |
Loading…
x
Reference in New Issue
Block a user