dagster-doc-added (#7055)

* dagster-doc-added

* dagster-config-updated
This commit is contained in:
Abhishek Pandey 2022-08-30 20:17:17 +05:30 committed by GitHub
parent 3bcef4f58c
commit 8708520c28
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 769 additions and 0 deletions

View File

@ -393,6 +393,12 @@ site_menu:
url: /openmetadata/connectors/pipeline/fivetran/airflow
- category: OpenMetadata / Connectors / Pipeline / Fivetran / CLI
url: /openmetadata/connectors/pipeline/fivetran/cli
- category: OpenMetadata / Connectors / Pipeline / Dagster
url: /openmetadata/connectors/pipeline/dagster
- category: OpenMetadata / Connectors / Pipeline / Dagster / Airflow
url: /openmetadata/connectors/pipeline/dagster/airflow
- category: OpenMetadata / Connectors / Pipeline / Dagster / CLI
url: /openmetadata/connectors/pipeline/dagster/cli
- category: OpenMetadata / Connectors / ML Model
url: /openmetadata/connectors/ml-model

View File

@ -54,6 +54,7 @@ OpenMetadata can extract metadata from the following list of connectors:
- [Airflow](/openmetadata/connectors/pipeline/airflow)
- [Glue](/openmetadata/connectors/pipeline/glue)
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
- [Dagster](/openmetadata/connectors/pipeline/dagster)
## ML Model Services

View File

@ -0,0 +1,304 @@
---
title: Run Dagster Connector using Airflow SDK
slug: /openmetadata/connectors/pipeline/dagster/airflow
---
# Run Dagster using the Airflow SDK
In this section, we provide guides and references to use the Dagster connector.
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
## Requirements
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
</InlineCallout>
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
custom Airflow plugins to handle the workflow deployment.
### Python Requirements
To run the Dagster ingestion, you will need to install:
```bash
pip3 install "openmetadata-ingestion[dagster]"
```
## Metadata Ingestion
All connectors are defined as JSON Schemas.
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json)
you can find the structure to create a connection to Dagster.
In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json)
### 1. Define the YAML Config
This is a sample config for Dagster:
```yaml
source:
type: dagster
serviceName: dagster_source
serviceConnection:
config:
type: Dagster
hostPort: http://localhost:8080
numberOfStatus: 10
dbConnection:
type: name of database service
username: db username
password: db password
databaseSchema: database name
hostPort: host and port for database
sourceConfig:
config:
type: PipelineMetadata
# includeLineage: true
# pipelineFilterPattern:
# includes:
# - pipeline1
# - pipeline2
# excludes:
# - pipeline3
# - pipeline4
sink:
type: metadata-rest
config: { }
workflowConfig:
loggerLevel: INFO
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
#### Source Configuration - Service Connection
- **hostPort**: host and port for dagster pipeline
- **numberOfStatus**: 10
- **dbConnection**
- **type**: Name of the Database Service
- **username**: db username
- **password**: db password
- **databaseSchema**: database name
- **hostPort**: host and port for database connection
#### Source Configuration - Source Config
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json):
- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it.
- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g.,
```yaml
pipelineFilterPattern:
includes:
- users
- type_test
```
#### Sink Configuration
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
#### Workflow Configuration
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.
For a simple, local installation using our docker containers, this looks like:
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client).
You can find the different implementation of the ingestion below.
<Collapse title="Configure SSO in the Ingestion Workflows">
### Auth0 SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Azure SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: azure
securityConfig:
clientSecret: '{your_client_secret}'
authority: '{your_authority_url}'
clientId: '{your_client_id}'
scopes:
- your_scopes
```
### Custom OIDC SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Google SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: google
securityConfig:
secretKey: '{path-to-json-creds}'
```
### Okta SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: okta
securityConfig:
clientId: "{CLIENT_ID - SPA APP}"
orgURL: "{ISSUER_URL}/v1/token"
privateKey: "{public/private keypair}"
email: "{email}"
scopes:
- token
```
### Amazon Cognito SSO
The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### OneLogin SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### KeyCloak SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
</Collapse>
## 2. Prepare the Ingestion DAG
Create a Python file in your Airflow DAGs directory with the following contents:
```python
import pathlib
import yaml
from datetime import timedelta
from airflow import DAG
try:
from airflow.operators.python import PythonOperator
except ModuleNotFoundError:
from airflow.operators.python_operator import PythonOperator
from metadata.config.common import load_config_file
from metadata.ingestion.api.workflow import Workflow
from airflow.utils.dates import days_ago
default_args = {
"owner": "user_name",
"email": ["username@org.com"],
"email_on_failure": False,
"retries": 3,
"retry_delay": timedelta(minutes=5),
"execution_timeout": timedelta(minutes=60)
}
config = """
<your YAML configuration>
"""
def metadata_ingestion_workflow():
workflow_config = yaml.safe_load(config)
workflow = Workflow.create(workflow_config)
workflow.execute()
workflow.raise_from_status()
workflow.print_status()
workflow.stop()
with DAG(
"sample_data",
default_args=default_args,
description="An example DAG which runs a OpenMetadata ingestion workflow",
start_date=days_ago(1),
is_paused_upon_creation=False,
schedule_interval='*/5 * * * *',
catchup=False,
) as dag:
ingest_task = PythonOperator(
task_id="ingest_using_recipe",
python_callable=metadata_ingestion_workflow,
)
```
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will
be able to extract metadata from different sources.

View File

@ -0,0 +1,256 @@
---
title: Run Dagster Connector using the CLI
slug: /openmetadata/connectors/pipeline/dagster/cli
---
# Run Dagster using the metadata CLI
In this section, we provide guides and references to use the Dagster connector.
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
## Requirements
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
</InlineCallout>
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
custom Airflow plugins to handle the workflow deployment.
### Python Requirements
To run the Dagster ingestion, you will need to install:
```bash
pip3 install "openmetadata-ingestion[dagster]"
```
## Metadata Ingestion
All connectors are defined as JSON Schemas.
[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json)
you can find the structure to create a connection to Dagster.
In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following
[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json)
### 1. Define the YAML Config
This is a sample config for Dagster:
```yaml
source:
type: dagster
serviceName: dagster_source
serviceConnection:
config:
type: Dagster
hostPort: http://localhost:8080
numberOfStatus: 10
dbConnection:
type: name of database service
username: db username
password: db password
databaseSchema: database name
hostPort: host and port for database
sourceConfig:
config:
type: PipelineMetadata
# includeLineage: true
# pipelineFilterPattern:
# includes:
# - pipeline1
# - pipeline2
# excludes:
# - pipeline3
# - pipeline4
sink:
type: metadata-rest
config: { }
workflowConfig:
loggerLevel: INFO
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
#### Source Configuration - Service Connection
- **hostPort**: host and port for dagster pipeline
- **numberOfStatus**: 10
- **dbConnection**
- **type**: Name of the Database Service
- **username**: db username
- **password**: db password
- **databaseSchema**: database name
- **hostPort**: host and port for database connection
#### Source Configuration - Source Config
The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json):
- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it.
- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g.,
```yaml
pipelineFilterPattern:
includes:
- users
- type_test
```
#### Sink Configuration
To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`.
#### Workflow Configuration
The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation.
For a simple, local installation using our docker containers, this looks like:
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
```
We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client).
You can find the different implementation of the ingestion below.
<Collapse title="Configure SSO in the Ingestion Workflows">
### Auth0 SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Azure SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: azure
securityConfig:
clientSecret: '{your_client_secret}'
authority: '{your_authority_url}'
clientId: '{your_client_id}'
scopes:
- your_scopes
```
### Custom OIDC SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### Google SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: google
securityConfig:
secretKey: '{path-to-json-creds}'
```
### Okta SSO
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: okta
securityConfig:
clientId: "{CLIENT_ID - SPA APP}"
orgURL: "{ISSUER_URL}/v1/token"
privateKey: "{public/private keypair}"
email: "{email}"
scopes:
- token
```
### Amazon Cognito SSO
The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens)
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: auth0
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### OneLogin SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
### KeyCloak SSO
Which uses Custom OIDC for the ingestion
```yaml
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: custom-oidc
securityConfig:
clientId: '{your_client_id}'
secretKey: '{your_client_secret}'
domain: '{your_domain}'
```
</Collapse>
### 2. Run with the CLI
First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
```bash
metadata ingest -c <path-to-yaml>
```
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration,
you will be able to extract metadata from different sources.

View File

@ -0,0 +1,201 @@
---
title: Dagster
slug: /openmetadata/connectors/pipeline/dagster
---
# Dagster
In this section, we provide guides and references to use the Dagster connector.
Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI:
- [Requirements](#requirements)
- [Metadata Ingestion](#metadata-ingestion)
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
the following docs to connect using Airflow SDK or with the CLI.
<TileContainer>
<Tile
icon="air"
title="Ingest with Airflow"
text="Configure the ingestion using Airflow SDK"
link="/openmetadata/connectors/pipeline/dagster/airflow"
size="half"
/>
<Tile
icon="account_tree"
title="Ingest with the CLI"
text="Run a one-time ingestion using the metadata CLI"
link="/openmetadata/connectors/pipeline/dagster/cli"
size="half"
/>
</TileContainer>
## Requirements
<InlineCallout color="violet-70" icon="description" bold="OpenMetadata 0.12 or later" href="/deployment">
To deploy OpenMetadata, check the <a href="/deployment">Deployment</a> guides.
</InlineCallout>
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
custom Airflow plugins to handle the workflow deployment.
## Metadata Ingestion
### 1. Visit the Services Page
The first step is ingesting the metadata from your sources. Under
Settings, you will find a Services link an external source system to
OpenMetadata. Once a service is created, it can be used to configure
metadata, usage, and profiler workflows.
To visit the Services page, select Services from the Settings menu.
<Image
src="/images/openmetadata/connectors/visit-services.png"
alt="Visit Services Page"
caption="Find Services under the Settings menu"
/>
### 2. Create a New Service
Click on the Add New Service button to start the Service creation.
<Image
src="/images/openmetadata/connectors/create-service.png"
alt="Create a new service"
caption="Add a new Service from the Services page"
/>
### 3. Select the Service Type
Select Dagster as the service type and click Next.
<div className="w-100 flex justify-center">
<Image
src="/images/openmetadata/connectors/dagster/select-service.png"
alt="Select Service"
caption="Select your service from the list"
/>
</div>
### 4. Name and Describe your Service
Provide a name and description for your service as illustrated below.
#### Service Name
OpenMetadata uniquely identifies services by their Service Name. Provide
a name that distinguishes your deployment from other services, including
the other {connector} services that you might be ingesting metadata
from.
<div className="w-100 flex justify-center">
<Image
src="/images/openmetadata/connectors/dagster/add-new-service.png"
alt="Add New Service"
caption="Provide a Name and description for your Service"
/>
</div>
### 5. Configure the Service Connection
In this step, we will configure the connection settings required for
this connector. Please follow the instructions below to ensure that
you've configured the connector to read from your dagster service as
desired.
<div className="w-100 flex justify-center">
<Image
src="/images/openmetadata/connectors/dagster/service-connection.png"
alt="Configure service connection"
caption="Configure the service connection by filling the form"
/>
</div>
Once the credentials have been added, click on `Test Connection` and Save
the changes.
<div className="w-100 flex justify-center">
<Image
src="/images/openmetadata/connectors/test-connection.png"
alt="Test Connection"
caption="Test the connection and save the Service"
/>
</div>
#### Connection Options
- **Dagster API Key**: Dagster API Key.
- **Dagster API Secret**: Dagster API Secret.
### 6. Configure Metadata Ingestion
In this step we will configure the metadata ingestion pipeline,
Please follow the instructions below
<Image
src="/images/openmetadata/connectors/configure-metadata-ingestion-pipeline.png"
alt="Configure Metadata Ingestion"
caption="Configure Metadata Ingestion Page"
/>
#### Metadata Ingestion Options
- **Name**: This field refers to the name of ingestion pipeline, you can customize the name or use the generated name.
- **Pipeline Filter Pattern (Optional)**: Use to pipeline filter patterns to control whether or not to include pipeline as part of metadata ingestion.
- **Include**: Explicitly include pipeline by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
- **Exclude**: Explicitly exclude pipeline by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be included.
- **Include lineage (toggle)**: Set the Include lineage toggle to control whether or not to include lineage between pipelines and data sources as part of metadata ingestion.
- **Enable Debug Log (toggle)**: Set the Enable Debug Log toggle to set the default log level to debug, these logs can be viewed later in Airflow.
### 7. Schedule the Ingestion and Deploy
Scheduling can be set up at an hourly, daily, or weekly cadence. The
timezone is in UTC. Select a Start Date to schedule for ingestion. It is
optional to add an End Date.
Review your configuration settings. If they match what you intended,
click Deploy to create the service and schedule metadata ingestion.
If something doesn't look right, click the Back button to return to the
appropriate step and change the settings as needed.
<Image
src="/images/openmetadata/connectors/schedule.png"
alt="Schedule the Workflow"
caption="Schedule the Ingestion Pipeline and Deploy"
/>
After configuring the workflow, you can click on Deploy to create the
pipeline.
### 8. View the Ingestion Pipeline
Once the workflow has been successfully deployed, you can view the
Ingestion Pipeline running from the Service Page.
<Image
src="/images/openmetadata/connectors/view-ingestion-pipeline.png"
alt="View Ingestion Pipeline"
caption="View the Ingestion Pipeline from the Service Page"
/>
### 9. Workflow Deployment Error
If there were any errors during the workflow deployment process, the
Ingestion Pipeline Entity will still be created, but no workflow will be
present in the Ingestion container.
You can then edit the Ingestion Pipeline and Deploy it again.
<Image
src="/images/openmetadata/connectors/workflow-deployment-error.png"
alt="Workflow Deployment Error"
caption="Edit and Deploy the Ingestion Pipeline"
/>
From the Connection tab, you can also Edit the Service if needed.

View File

@ -9,3 +9,4 @@ slug: /openmetadata/connectors/pipeline
- [Airflow](/openmetadata/connectors/pipeline/airflow)
- [Glue](/openmetadata/connectors/pipeline/glue)
- [Fivetran](/openmetadata/connectors/pipeline/fivetran)
- [Dagster](/openmetadata/connectors/pipeline/dagster)

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 381 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB