mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-08-18 14:06:59 +00:00
parent
3bcef4f58c
commit
8708520c28
@ -393,6 +393,12 @@ site_menu:
|
|||||||
url: /openmetadata/connectors/pipeline/fivetran/airflow
|
url: /openmetadata/connectors/pipeline/fivetran/airflow
|
||||||
- category: OpenMetadata / Connectors / Pipeline / Fivetran / CLI
|
- category: OpenMetadata / Connectors / Pipeline / Fivetran / CLI
|
||||||
url: /openmetadata/connectors/pipeline/fivetran/cli
|
url: /openmetadata/connectors/pipeline/fivetran/cli
|
||||||
|
- category: OpenMetadata / Connectors / Pipeline / Dagster
|
||||||
|
url: /openmetadata/connectors/pipeline/dagster
|
||||||
|
- category: OpenMetadata / Connectors / Pipeline / Dagster / Airflow
|
||||||
|
url: /openmetadata/connectors/pipeline/dagster/airflow
|
||||||
|
- category: OpenMetadata / Connectors / Pipeline / Dagster / CLI
|
||||||
|
url: /openmetadata/connectors/pipeline/dagster/cli
|
||||||
|
|
||||||
- category: OpenMetadata / Connectors / ML Model
|
- category: OpenMetadata / Connectors / ML Model
|
||||||
url: /openmetadata/connectors/ml-model
|
url: /openmetadata/connectors/ml-model
|
||||||
|
@ -54,6 +54,7 @@ OpenMetadata can extract metadata from the following list of connectors:
|
|||||||
- [Airflow](/openmetadata/connectors/pipeline/airflow)
|
- [Airflow](/openmetadata/connectors/pipeline/airflow)
|
||||||
- [Glue](/openmetadata/connectors/pipeline/glue)
|
- [Glue](/openmetadata/connectors/pipeline/glue)
|
||||||
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
|
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
|
||||||
|
- [Dagster](/openmetadata/connectors/pipeline/dagster)
|
||||||
|
|
||||||
## ML Model Services
|
## ML Model Services
|
||||||
|
|
||||||
|
@ -0,0 +1,304 @@
|
|||||||
|
---
|
||||||
|
title: Run Dagster Connector using Airflow SDK
|
||||||
|
slug: /openmetadata/connectors/pipeline/dagster/airflow
|
||||||
|
---
|
||||||
|
|
||||||
|
# Run Dagster using the Airflow SDK
|
||||||
|
|
||||||
|
In this section, we provide guides and references to use the Dagster connector.
|
||||||
|
|
||||||
|
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Metadata Ingestion](#metadata-ingestion)
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
|
||||||
|
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
|
||||||
|
</InlineCallout>
|
||||||
|
|
||||||
|
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
|
||||||
|
custom Airflow plugins to handle the workflow deployment.
|
||||||
|
|
||||||
|
### Python Requirements
|
||||||
|
|
||||||
|
To run the Dagster ingestion, you will need to install:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip3 install "openmetadata-ingestion[dagster]"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Metadata Ingestion
|
||||||
|
|
||||||
|
All connectors are defined as JSON Schemas.
|
||||||
|
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json)
|
||||||
|
you can find the structure to create a connection to Dagster.
|
||||||
|
|
||||||
|
In order to create and run a Metadata Ingestion workflow, we will follow
|
||||||
|
the steps to create a YAML configuration able to connect to the source,
|
||||||
|
process the Entities if needed, and reach the OpenMetadata server.
|
||||||
|
|
||||||
|
The workflow is modeled around the following
|
||||||
|
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json)
|
||||||
|
|
||||||
|
### 1. Define the YAML Config
|
||||||
|
|
||||||
|
This is a sample config for Dagster:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
type: dagster
|
||||||
|
serviceName: dagster_source
|
||||||
|
serviceConnection:
|
||||||
|
config:
|
||||||
|
type: Dagster
|
||||||
|
hostPort: http://localhost:8080
|
||||||
|
numberOfStatus: 10
|
||||||
|
dbConnection:
|
||||||
|
type: name of database service
|
||||||
|
username: db username
|
||||||
|
password: db password
|
||||||
|
databaseSchema: database name
|
||||||
|
hostPort: host and port for database
|
||||||
|
sourceConfig:
|
||||||
|
config:
|
||||||
|
type: PipelineMetadata
|
||||||
|
# includeLineage: true
|
||||||
|
# pipelineFilterPattern:
|
||||||
|
# includes:
|
||||||
|
# - pipeline1
|
||||||
|
# - pipeline2
|
||||||
|
# excludes:
|
||||||
|
# - pipeline3
|
||||||
|
# - pipeline4
|
||||||
|
sink:
|
||||||
|
type: metadata-rest
|
||||||
|
config: { }
|
||||||
|
workflowConfig:
|
||||||
|
loggerLevel: INFO
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: no-auth
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Source Configuration - Service Connection
|
||||||
|
|
||||||
|
|
||||||
|
- **hostPort**: host and port for dagster pipeline
|
||||||
|
- **numberOfStatus**: 10
|
||||||
|
- **dbConnection**
|
||||||
|
- **type**: Name of the Database Service
|
||||||
|
- **username**: db username
|
||||||
|
- **password**: db password
|
||||||
|
- **databaseSchema**: database name
|
||||||
|
- **hostPort**: host and port for database connection
|
||||||
|
|
||||||
|
#### Source Configuration - Source Config
|
||||||
|
|
||||||
|
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json):
|
||||||
|
|
||||||
|
- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it.
|
||||||
|
- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g.,
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
pipelineFilterPattern:
|
||||||
|
includes:
|
||||||
|
- users
|
||||||
|
- type_test
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sink Configuration
|
||||||
|
|
||||||
|
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
|
||||||
|
|
||||||
|
#### Workflow Configuration
|
||||||
|
|
||||||
|
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.
|
||||||
|
|
||||||
|
For a simple, local installation using our docker containers, this looks like:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: no-auth
|
||||||
|
```
|
||||||
|
|
||||||
|
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client).
|
||||||
|
You can find the different implementation of the ingestion below.
|
||||||
|
|
||||||
|
<Collapse title="Configure SSO in the Ingestion Workflows">
|
||||||
|
|
||||||
|
### Auth0 SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: auth0
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Azure SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: azure
|
||||||
|
securityConfig:
|
||||||
|
clientSecret: '{your_client_secret}'
|
||||||
|
authority: '{your_authority_url}'
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
scopes:
|
||||||
|
- your_scopes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom OIDC SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Google SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: google
|
||||||
|
securityConfig:
|
||||||
|
secretKey: '{path-to-json-creds}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Okta SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: okta
|
||||||
|
securityConfig:
|
||||||
|
clientId: "{CLIENT_ID - SPA APP}"
|
||||||
|
orgURL: "{ISSUER_URL}/v1/token"
|
||||||
|
privateKey: "{public/private keypair}"
|
||||||
|
email: "{email}"
|
||||||
|
scopes:
|
||||||
|
- token
|
||||||
|
```
|
||||||
|
|
||||||
|
### Amazon Cognito SSO
|
||||||
|
|
||||||
|
The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: auth0
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### OneLogin SSO
|
||||||
|
|
||||||
|
Which uses Custom OIDC for the ingestion
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### KeyCloak SSO
|
||||||
|
|
||||||
|
Which uses Custom OIDC for the ingestion
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</Collapse>
|
||||||
|
|
||||||
|
|
||||||
|
## 2. Prepare the Ingestion DAG
|
||||||
|
|
||||||
|
Create a Python file in your Airflow DAGs directory with the following contents:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pathlib
|
||||||
|
import yaml
|
||||||
|
from datetime import timedelta
|
||||||
|
from airflow import DAG
|
||||||
|
|
||||||
|
try:
|
||||||
|
from airflow.operators.python import PythonOperator
|
||||||
|
except ModuleNotFoundError:
|
||||||
|
from airflow.operators.python_operator import PythonOperator
|
||||||
|
|
||||||
|
from metadata.config.common import load_config_file
|
||||||
|
from metadata.ingestion.api.workflow import Workflow
|
||||||
|
from airflow.utils.dates import days_ago
|
||||||
|
|
||||||
|
default_args = {
|
||||||
|
"owner": "user_name",
|
||||||
|
"email": ["username@org.com"],
|
||||||
|
"email_on_failure": False,
|
||||||
|
"retries": 3,
|
||||||
|
"retry_delay": timedelta(minutes=5),
|
||||||
|
"execution_timeout": timedelta(minutes=60)
|
||||||
|
}
|
||||||
|
|
||||||
|
config = """
|
||||||
|
<your YAML configuration>
|
||||||
|
"""
|
||||||
|
|
||||||
|
def metadata_ingestion_workflow():
|
||||||
|
workflow_config = yaml.safe_load(config)
|
||||||
|
workflow = Workflow.create(workflow_config)
|
||||||
|
workflow.execute()
|
||||||
|
workflow.raise_from_status()
|
||||||
|
workflow.print_status()
|
||||||
|
workflow.stop()
|
||||||
|
|
||||||
|
with DAG(
|
||||||
|
"sample_data",
|
||||||
|
default_args=default_args,
|
||||||
|
description="An example DAG which runs a OpenMetadata ingestion workflow",
|
||||||
|
start_date=days_ago(1),
|
||||||
|
is_paused_upon_creation=False,
|
||||||
|
schedule_interval='*/5 * * * *',
|
||||||
|
catchup=False,
|
||||||
|
) as dag:
|
||||||
|
ingest_task = PythonOperator(
|
||||||
|
task_id="ingest_using_recipe",
|
||||||
|
python_callable=metadata_ingestion_workflow,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will
|
||||||
|
be able to extract metadata from different sources.
|
@ -0,0 +1,256 @@
|
|||||||
|
---
|
||||||
|
title: Run Dagster Connector using the CLI
|
||||||
|
slug: /openmetadata/connectors/pipeline/dagster/cli
|
||||||
|
---
|
||||||
|
|
||||||
|
# Run Dagster using the metadata CLI
|
||||||
|
|
||||||
|
In this section, we provide guides and references to use the Dagster connector.
|
||||||
|
|
||||||
|
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Metadata Ingestion](#metadata-ingestion)
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
|
||||||
|
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
|
||||||
|
</InlineCallout>
|
||||||
|
|
||||||
|
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
|
||||||
|
custom Airflow plugins to handle the workflow deployment.
|
||||||
|
|
||||||
|
### Python Requirements
|
||||||
|
|
||||||
|
To run the Dagster ingestion, you will need to install:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip3 install "openmetadata-ingestion[dagster]"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Metadata Ingestion
|
||||||
|
|
||||||
|
All connectors are defined as JSON Schemas.
|
||||||
|
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json)
|
||||||
|
you can find the structure to create a connection to Dagster.
|
||||||
|
|
||||||
|
In order to create and run a Metadata Ingestion workflow, we will follow
|
||||||
|
the steps to create a YAML configuration able to connect to the source,
|
||||||
|
process the Entities if needed, and reach the OpenMetadata server.
|
||||||
|
|
||||||
|
The workflow is modeled around the following
|
||||||
|
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json)
|
||||||
|
|
||||||
|
### 1. Define the YAML Config
|
||||||
|
|
||||||
|
This is a sample config for Dagster:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
type: dagster
|
||||||
|
serviceName: dagster_source
|
||||||
|
serviceConnection:
|
||||||
|
config:
|
||||||
|
type: Dagster
|
||||||
|
hostPort: http://localhost:8080
|
||||||
|
numberOfStatus: 10
|
||||||
|
dbConnection:
|
||||||
|
type: name of database service
|
||||||
|
username: db username
|
||||||
|
password: db password
|
||||||
|
databaseSchema: database name
|
||||||
|
hostPort: host and port for database
|
||||||
|
sourceConfig:
|
||||||
|
config:
|
||||||
|
type: PipelineMetadata
|
||||||
|
# includeLineage: true
|
||||||
|
# pipelineFilterPattern:
|
||||||
|
# includes:
|
||||||
|
# - pipeline1
|
||||||
|
# - pipeline2
|
||||||
|
# excludes:
|
||||||
|
# - pipeline3
|
||||||
|
# - pipeline4
|
||||||
|
sink:
|
||||||
|
type: metadata-rest
|
||||||
|
config: { }
|
||||||
|
workflowConfig:
|
||||||
|
loggerLevel: INFO
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: no-auth
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Source Configuration - Service Connection
|
||||||
|
|
||||||
|
|
||||||
|
- **hostPort**: host and port for dagster pipeline
|
||||||
|
- **numberOfStatus**: 10
|
||||||
|
- **dbConnection**
|
||||||
|
- **type**: Name of the Database Service
|
||||||
|
- **username**: db username
|
||||||
|
- **password**: db password
|
||||||
|
- **databaseSchema**: database name
|
||||||
|
- **hostPort**: host and port for database connection
|
||||||
|
|
||||||
|
#### Source Configuration - Source Config
|
||||||
|
|
||||||
|
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json):
|
||||||
|
|
||||||
|
- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it.
|
||||||
|
- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g.,
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
pipelineFilterPattern:
|
||||||
|
includes:
|
||||||
|
- users
|
||||||
|
- type_test
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sink Configuration
|
||||||
|
|
||||||
|
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
|
||||||
|
|
||||||
|
#### Workflow Configuration
|
||||||
|
|
||||||
|
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.
|
||||||
|
|
||||||
|
For a simple, local installation using our docker containers, this looks like:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: no-auth
|
||||||
|
```
|
||||||
|
|
||||||
|
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client).
|
||||||
|
You can find the different implementation of the ingestion below.
|
||||||
|
|
||||||
|
<Collapse title="Configure SSO in the Ingestion Workflows">
|
||||||
|
|
||||||
|
### Auth0 SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: auth0
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Azure SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: azure
|
||||||
|
securityConfig:
|
||||||
|
clientSecret: '{your_client_secret}'
|
||||||
|
authority: '{your_authority_url}'
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
scopes:
|
||||||
|
- your_scopes
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom OIDC SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Google SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: google
|
||||||
|
securityConfig:
|
||||||
|
secretKey: '{path-to-json-creds}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Okta SSO
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: http://localhost:8585/api
|
||||||
|
authProvider: okta
|
||||||
|
securityConfig:
|
||||||
|
clientId: "{CLIENT_ID - SPA APP}"
|
||||||
|
orgURL: "{ISSUER_URL}/v1/token"
|
||||||
|
privateKey: "{public/private keypair}"
|
||||||
|
email: "{email}"
|
||||||
|
scopes:
|
||||||
|
- token
|
||||||
|
```
|
||||||
|
|
||||||
|
### Amazon Cognito SSO
|
||||||
|
|
||||||
|
The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: auth0
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### OneLogin SSO
|
||||||
|
|
||||||
|
Which uses Custom OIDC for the ingestion
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### KeyCloak SSO
|
||||||
|
|
||||||
|
Which uses Custom OIDC for the ingestion
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
workflowConfig:
|
||||||
|
openMetadataServerConfig:
|
||||||
|
hostPort: 'http://localhost:8585/api'
|
||||||
|
authProvider: custom-oidc
|
||||||
|
securityConfig:
|
||||||
|
clientId: '{your_client_id}'
|
||||||
|
secretKey: '{your_client_secret}'
|
||||||
|
domain: '{your_domain}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</Collapse>
|
||||||
|
|
||||||
|
### 2. Run with the CLI
|
||||||
|
|
||||||
|
First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
metadata ingest -c <path-to-yaml>
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
|
||||||
|
you will be able to extract metadata from different sources.
|
@ -0,0 +1,201 @@
|
|||||||
|
---
|
||||||
|
title: Dagster
|
||||||
|
slug: /openmetadata/connectors/pipeline/dagster
|
||||||
|
---
|
||||||
|
|
||||||
|
# Dagster
|
||||||
|
|
||||||
|
In this section, we provide guides and references to use the Dagster connector.
|
||||||
|
|
||||||
|
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Metadata Ingestion](#metadata-ingestion)
|
||||||
|
|
||||||
|
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||||||
|
the following docs to connect using Airflow SDK or with the CLI.
|
||||||
|
|
||||||
|
<TileContainer>
|
||||||
|
<Tile
|
||||||
|
icon="air"
|
||||||
|
title="Ingest with Airflow"
|
||||||
|
text="Configure the ingestion using Airflow SDK"
|
||||||
|
link="/openmetadata/connectors/pipeline/dagster/airflow"
|
||||||
|
size="half"
|
||||||
|
/>
|
||||||
|
<Tile
|
||||||
|
icon="account_tree"
|
||||||
|
title="Ingest with the CLI"
|
||||||
|
text="Run a one-time ingestion using the metadata CLI"
|
||||||
|
link="/openmetadata/connectors/pipeline/dagster/cli"
|
||||||
|
size="half"
|
||||||
|
/>
|
||||||
|
</TileContainer>
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
|
||||||
|
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
|
||||||
|
</InlineCallout>
|
||||||
|
|
||||||
|
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
|
||||||
|
custom Airflow plugins to handle the workflow deployment.
|
||||||
|
|
||||||
|
## Metadata Ingestion
|
||||||
|
|
||||||
|
### 1. Visit the Services Page
|
||||||
|
|
||||||
|
The first step is ingesting the metadata from your sources. Under
|
||||||
|
Settings, you will find a Services link an external source system to
|
||||||
|
OpenMetadata. Once a service is created, it can be used to configure
|
||||||
|
metadata, usage, and profiler workflows.
|
||||||
|
|
||||||
|
To visit the Services page, select Services from the Settings menu.
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/visit-services.png"
|
||||||
|
alt="Visit Services Page"
|
||||||
|
caption="Find Services under the Settings menu"
|
||||||
|
/>
|
||||||
|
|
||||||
|
### 2. Create a New Service
|
||||||
|
|
||||||
|
Click on the Add New Service button to start the Service creation.
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/create-service.png"
|
||||||
|
alt="Create a new service"
|
||||||
|
caption="Add a new Service from the Services page"
|
||||||
|
/>
|
||||||
|
|
||||||
|
### 3. Select the Service Type
|
||||||
|
|
||||||
|
Select Dagster as the service type and click Next.
|
||||||
|
|
||||||
|
<div className="w-100 flex justify-center">
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/dagster/select-service.png"
|
||||||
|
alt="Select Service"
|
||||||
|
caption="Select your service from the list"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
### 4. Name and Describe your Service
|
||||||
|
|
||||||
|
Provide a name and description for your service as illustrated below.
|
||||||
|
|
||||||
|
#### Service Name
|
||||||
|
|
||||||
|
OpenMetadata uniquely identifies services by their Service Name. Provide
|
||||||
|
a name that distinguishes your deployment from other services, including
|
||||||
|
the other {connector} services that you might be ingesting metadata
|
||||||
|
from.
|
||||||
|
|
||||||
|
|
||||||
|
<div className="w-100 flex justify-center">
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/dagster/add-new-service.png"
|
||||||
|
alt="Add New Service"
|
||||||
|
caption="Provide a Name and description for your Service"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
### 5. Configure the Service Connection
|
||||||
|
|
||||||
|
In this step, we will configure the connection settings required for
|
||||||
|
this connector. Please follow the instructions below to ensure that
|
||||||
|
you've configured the connector to read from your dagster service as
|
||||||
|
desired.
|
||||||
|
|
||||||
|
<div className="w-100 flex justify-center">
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/dagster/service-connection.png"
|
||||||
|
alt="Configure service connection"
|
||||||
|
caption="Configure the service connection by filling the form"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
Once the credentials have been added, click on `Test Connection` and Save
|
||||||
|
the changes.
|
||||||
|
|
||||||
|
<div className="w-100 flex justify-center">
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/test-connection.png"
|
||||||
|
alt="Test Connection"
|
||||||
|
caption="Test the connection and save the Service"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
#### Connection Options
|
||||||
|
|
||||||
|
- **Dagster API Key**: Dagster API Key.
|
||||||
|
- **Dagster API Secret**: Dagster API Secret.
|
||||||
|
|
||||||
|
### 6. Configure Metadata Ingestion
|
||||||
|
|
||||||
|
In this step we will configure the metadata ingestion pipeline,
|
||||||
|
Please follow the instructions below
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/configure-metadata-ingestion-pipeline.png"
|
||||||
|
alt="Configure Metadata Ingestion"
|
||||||
|
caption="Configure Metadata Ingestion Page"
|
||||||
|
/>
|
||||||
|
|
||||||
|
#### Metadata Ingestion Options
|
||||||
|
|
||||||
|
- **Name**: This field refers to the name of ingestion pipeline, you can customize the name or use the generated name.
|
||||||
|
- **Pipeline Filter Pattern (Optional)**: Use to pipeline filter patterns to control whether or not to include pipeline as part of metadata ingestion.
|
||||||
|
- **Include**: Explicitly include pipeline by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
|
||||||
|
- **Exclude**: Explicitly exclude pipeline by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be included.
|
||||||
|
- **Include lineage (toggle)**: Set the Include lineage toggle to control whether or not to include lineage between pipelines and data sources as part of metadata ingestion.
|
||||||
|
- **Enable Debug Log (toggle)**: Set the Enable Debug Log toggle to set the default log level to debug, these logs can be viewed later in Airflow.
|
||||||
|
|
||||||
|
### 7. Schedule the Ingestion and Deploy
|
||||||
|
|
||||||
|
Scheduling can be set up at an hourly, daily, or weekly cadence. The
|
||||||
|
timezone is in UTC. Select a Start Date to schedule for ingestion. It is
|
||||||
|
optional to add an End Date.
|
||||||
|
|
||||||
|
Review your configuration settings. If they match what you intended,
|
||||||
|
click Deploy to create the service and schedule metadata ingestion.
|
||||||
|
|
||||||
|
If something doesn't look right, click the Back button to return to the
|
||||||
|
appropriate step and change the settings as needed.
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/schedule.png"
|
||||||
|
alt="Schedule the Workflow"
|
||||||
|
caption="Schedule the Ingestion Pipeline and Deploy"
|
||||||
|
/>
|
||||||
|
|
||||||
|
After configuring the workflow, you can click on Deploy to create the
|
||||||
|
pipeline.
|
||||||
|
|
||||||
|
### 8. View the Ingestion Pipeline
|
||||||
|
|
||||||
|
Once the workflow has been successfully deployed, you can view the
|
||||||
|
Ingestion Pipeline running from the Service Page.
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/view-ingestion-pipeline.png"
|
||||||
|
alt="View Ingestion Pipeline"
|
||||||
|
caption="View the Ingestion Pipeline from the Service Page"
|
||||||
|
/>
|
||||||
|
|
||||||
|
### 9. Workflow Deployment Error
|
||||||
|
|
||||||
|
If there were any errors during the workflow deployment process, the
|
||||||
|
Ingestion Pipeline Entity will still be created, but no workflow will be
|
||||||
|
present in the Ingestion container.
|
||||||
|
|
||||||
|
You can then edit the Ingestion Pipeline and Deploy it again.
|
||||||
|
|
||||||
|
<Image
|
||||||
|
src="/images/openmetadata/connectors/workflow-deployment-error.png"
|
||||||
|
alt="Workflow Deployment Error"
|
||||||
|
caption="Edit and Deploy the Ingestion Pipeline"
|
||||||
|
/>
|
||||||
|
|
||||||
|
From the Connection tab, you can also Edit the Service if needed.
|
@ -9,3 +9,4 @@ slug: /openmetadata/connectors/pipeline
|
|||||||
- [Airflow](/openmetadata/connectors/pipeline/airflow)
|
- [Airflow](/openmetadata/connectors/pipeline/airflow)
|
||||||
- [Glue](/openmetadata/connectors/pipeline/glue)
|
- [Glue](/openmetadata/connectors/pipeline/glue)
|
||||||
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
|
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
|
||||||
|
- [Dagster](/openmetadata/connectors/pipeline/dagster)
|
||||||
|
Binary file not shown.
After Width: | Height: | Size: 83 KiB |
Binary file not shown.
After Width: | Height: | Size: 381 KiB |
Binary file not shown.
After Width: | Height: | Size: 238 KiB |
Loading…
x
Reference in New Issue
Block a user