diff --git a/openmetadata-docs/content/menu.md b/openmetadata-docs/content/menu.md index 3a896572195..2330d06de5e 100644 --- a/openmetadata-docs/content/menu.md +++ b/openmetadata-docs/content/menu.md @@ -393,6 +393,12 @@ site_menu: url: /openmetadata/connectors/pipeline/fivetran/airflow - category: OpenMetadata / Connectors / Pipeline / Fivetran / CLI url: /openmetadata/connectors/pipeline/fivetran/cli + - category: OpenMetadata / Connectors / Pipeline / Dagster + url: /openmetadata/connectors/pipeline/dagster + - category: OpenMetadata / Connectors / Pipeline / Dagster / Airflow + url: /openmetadata/connectors/pipeline/dagster/airflow + - category: OpenMetadata / Connectors / Pipeline / Dagster / CLI + url: /openmetadata/connectors/pipeline/dagster/cli - category: OpenMetadata / Connectors / ML Model url: /openmetadata/connectors/ml-model diff --git a/openmetadata-docs/content/openmetadata/connectors/index.md b/openmetadata-docs/content/openmetadata/connectors/index.md index b0991f01e08..63eb799cbf6 100644 --- a/openmetadata-docs/content/openmetadata/connectors/index.md +++ b/openmetadata-docs/content/openmetadata/connectors/index.md @@ -54,6 +54,7 @@ OpenMetadata can extract metadata from the following list of connectors: - [Airflow](/openmetadata/connectors/pipeline/airflow) - [Glue](/openmetadata/connectors/pipeline/glue) - [Fivetran](/openmetadata/connectors/pipeline/fivetran) +- [Dagster](/openmetadata/connectors/pipeline/dagster) ## ML Model Services diff --git a/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/airflow.md b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/airflow.md new file mode 100644 index 00000000000..050cb66aa55 --- /dev/null +++ b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/airflow.md @@ -0,0 +1,304 @@ +--- +title: Run Dagster Connector using Airflow SDK +slug: /openmetadata/connectors/pipeline/dagster/airflow +--- + +# Run Dagster using the Airflow SDK + +In this section, we provide guides and references to use the Dagster connector. + +Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI: +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +## Requirements + + +To deploy OpenMetadata, check the Deployment guides. + + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with +custom Airflow plugins to handle the workflow deployment. + +### Python Requirements + +To run the Dagster ingestion, you will need to install: + +```bash +pip3 install "openmetadata-ingestion[dagster]" +``` + +## Metadata Ingestion + +All connectors are defined as JSON Schemas. +[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json) +you can find the structure to create a connection to Dagster. + +In order to create and run a Metadata Ingestion workflow, we will follow +the steps to create a YAML configuration able to connect to the source, +process the Entities if needed, and reach the OpenMetadata server. + +The workflow is modeled around the following +[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json) + +### 1. Define the YAML Config + +This is a sample config for Dagster: + +```yaml +source: + type: dagster + serviceName: dagster_source + serviceConnection: + config: + type: Dagster + hostPort: http://localhost:8080 + numberOfStatus: 10 + dbConnection: + type: name of database service + username: db username + password: db password + databaseSchema: database name + hostPort: host and port for database + sourceConfig: + config: + type: PipelineMetadata + # includeLineage: true + # pipelineFilterPattern: + # includes: + # - pipeline1 + # - pipeline2 + # excludes: + # - pipeline3 + # - pipeline4 +sink: + type: metadata-rest + config: { } +workflowConfig: + loggerLevel: INFO + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: no-auth + ``` + +#### Source Configuration - Service Connection + + +- **hostPort**: host and port for dagster pipeline +- **numberOfStatus**: 10 +- **dbConnection** + - **type**: Name of the Database Service + - **username**: db username + - **password**: db password + - **databaseSchema**: database name + - **hostPort**: host and port for database connection + +#### Source Configuration - Source Config + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): + +- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it. +- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g., + +```yaml +pipelineFilterPattern: + includes: + - users + - type_test +``` + +#### Sink Configuration + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. + +#### Workflow Configuration + +The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. + +For a simple, local installation using our docker containers, this looks like: + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: no-auth +``` + +We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client). +You can find the different implementation of the ingestion below. + + + +### Auth0 SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: auth0 + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### Azure SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: azure + securityConfig: + clientSecret: '{your_client_secret}' + authority: '{your_authority_url}' + clientId: '{your_client_id}' + scopes: + - your_scopes +``` + +### Custom OIDC SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### Google SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: google + securityConfig: + secretKey: '{path-to-json-creds}' +``` + +### Okta SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: okta + securityConfig: + clientId: "{CLIENT_ID - SPA APP}" + orgURL: "{ISSUER_URL}/v1/token" + privateKey: "{public/private keypair}" + email: "{email}" + scopes: + - token +``` + +### Amazon Cognito SSO + +The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens) + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: auth0 + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### OneLogin SSO + +Which uses Custom OIDC for the ingestion + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### KeyCloak SSO + +Which uses Custom OIDC for the ingestion + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + + + + +## 2. Prepare the Ingestion DAG + +Create a Python file in your Airflow DAGs directory with the following contents: + +```python +import pathlib +import yaml +from datetime import timedelta +from airflow import DAG + +try: + from airflow.operators.python import PythonOperator +except ModuleNotFoundError: + from airflow.operators.python_operator import PythonOperator + +from metadata.config.common import load_config_file +from metadata.ingestion.api.workflow import Workflow +from airflow.utils.dates import days_ago + +default_args = { + "owner": "user_name", + "email": ["username@org.com"], + "email_on_failure": False, + "retries": 3, + "retry_delay": timedelta(minutes=5), + "execution_timeout": timedelta(minutes=60) +} + +config = """ + +""" + +def metadata_ingestion_workflow(): + workflow_config = yaml.safe_load(config) + workflow = Workflow.create(workflow_config) + workflow.execute() + workflow.raise_from_status() + workflow.print_status() + workflow.stop() + +with DAG( + "sample_data", + default_args=default_args, + description="An example DAG which runs a OpenMetadata ingestion workflow", + start_date=days_ago(1), + is_paused_upon_creation=False, + schedule_interval='*/5 * * * *', + catchup=False, +) as dag: + ingest_task = PythonOperator( + task_id="ingest_using_recipe", + python_callable=metadata_ingestion_workflow, + ) +``` + +Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will +be able to extract metadata from different sources. \ No newline at end of file diff --git a/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/cli.md b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/cli.md new file mode 100644 index 00000000000..fa3a7b4ef26 --- /dev/null +++ b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/cli.md @@ -0,0 +1,256 @@ +--- +title: Run Dagster Connector using the CLI +slug: /openmetadata/connectors/pipeline/dagster/cli +--- + +# Run Dagster using the metadata CLI + +In this section, we provide guides and references to use the Dagster connector. + +Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI: +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +## Requirements + + +To deploy OpenMetadata, check the Deployment guides. + + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with +custom Airflow plugins to handle the workflow deployment. + +### Python Requirements + +To run the Dagster ingestion, you will need to install: + +```bash +pip3 install "openmetadata-ingestion[dagster]" +``` + +## Metadata Ingestion + +All connectors are defined as JSON Schemas. +[Here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/entity/services/connections/pipeline/dagsterConnection.json) +you can find the structure to create a connection to Dagster. + +In order to create and run a Metadata Ingestion workflow, we will follow +the steps to create a YAML configuration able to connect to the source, +process the Entities if needed, and reach the OpenMetadata server. + +The workflow is modeled around the following +[JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/workflow.json) + +### 1. Define the YAML Config + +This is a sample config for Dagster: + +```yaml +source: + type: dagster + serviceName: dagster_source + serviceConnection: + config: + type: Dagster + hostPort: http://localhost:8080 + numberOfStatus: 10 + dbConnection: + type: name of database service + username: db username + password: db password + databaseSchema: database name + hostPort: host and port for database + sourceConfig: + config: + type: PipelineMetadata + # includeLineage: true + # pipelineFilterPattern: + # includes: + # - pipeline1 + # - pipeline2 + # excludes: + # - pipeline3 + # - pipeline4 +sink: + type: metadata-rest + config: { } +workflowConfig: + loggerLevel: INFO + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: no-auth +``` + +#### Source Configuration - Service Connection + + +- **hostPort**: host and port for dagster pipeline +- **numberOfStatus**: 10 +- **dbConnection** + - **type**: Name of the Database Service + - **username**: db username + - **password**: db password + - **databaseSchema**: database name + - **hostPort**: host and port for database connection + +#### Source Configuration - Source Config + +The `sourceConfig` is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/catalog-rest-service/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json): + +- `dbServiceName`: Database Service Name for the creation of lineage, if the source supports it. +- `pipelineFilterPattern` and `chartFilterPattern`: Note that the `pipelineFilterPattern` and `chartFilterPattern` both support regex as include or exclude. E.g., + +```yaml +pipelineFilterPattern: + includes: + - users + - type_test +``` + +#### Sink Configuration + +To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. + +#### Workflow Configuration + +The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. + +For a simple, local installation using our docker containers, this looks like: + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: no-auth +``` + +We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/catalog-rest-service/src/main/resources/json/schema/security/client). +You can find the different implementation of the ingestion below. + + + +### Auth0 SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: auth0 + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### Azure SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: azure + securityConfig: + clientSecret: '{your_client_secret}' + authority: '{your_authority_url}' + clientId: '{your_client_id}' + scopes: + - your_scopes +``` + +### Custom OIDC SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### Google SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: google + securityConfig: + secretKey: '{path-to-json-creds}' +``` + +### Okta SSO + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: http://localhost:8585/api + authProvider: okta + securityConfig: + clientId: "{CLIENT_ID - SPA APP}" + orgURL: "{ISSUER_URL}/v1/token" + privateKey: "{public/private keypair}" + email: "{email}" + scopes: + - token +``` + +### Amazon Cognito SSO + +The ingestion can be configured by [Enabling JWT Tokens](https://docs.open-metadata.org/deployment/security/enable-jwt-tokens) + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: auth0 + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### OneLogin SSO + +Which uses Custom OIDC for the ingestion + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + +### KeyCloak SSO + +Which uses Custom OIDC for the ingestion + +```yaml +workflowConfig: + openMetadataServerConfig: + hostPort: 'http://localhost:8585/api' + authProvider: custom-oidc + securityConfig: + clientId: '{your_client_id}' + secretKey: '{your_client_secret}' + domain: '{your_domain}' +``` + + + +### 2. Run with the CLI + +First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: + +```bash +metadata ingest -c +``` + +Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, +you will be able to extract metadata from different sources. diff --git a/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/index.md b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/index.md new file mode 100644 index 00000000000..532aa59cdba --- /dev/null +++ b/openmetadata-docs/content/openmetadata/connectors/pipeline/dagster/index.md @@ -0,0 +1,201 @@ +--- +title: Dagster +slug: /openmetadata/connectors/pipeline/dagster +--- + +# Dagster + +In this section, we provide guides and references to use the Dagster connector. + +Configure and schedule Dagster metadata and profiler workflows from the OpenMetadata UI: +- [Requirements](#requirements) +- [Metadata Ingestion](#metadata-ingestion) + +If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check +the following docs to connect using Airflow SDK or with the CLI. + + + + + + +## Requirements + + +To deploy OpenMetadata, check the Deployment guides. + + +To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with +custom Airflow plugins to handle the workflow deployment. + +## Metadata Ingestion + +### 1. Visit the Services Page + +The first step is ingesting the metadata from your sources. Under +Settings, you will find a Services link an external source system to +OpenMetadata. Once a service is created, it can be used to configure +metadata, usage, and profiler workflows. + +To visit the Services page, select Services from the Settings menu. + + + +### 2. Create a New Service + +Click on the Add New Service button to start the Service creation. + + + +### 3. Select the Service Type + +Select Dagster as the service type and click Next. + +
+Select Service +
+ +### 4. Name and Describe your Service + +Provide a name and description for your service as illustrated below. + +#### Service Name + +OpenMetadata uniquely identifies services by their Service Name. Provide +a name that distinguishes your deployment from other services, including +the other {connector} services that you might be ingesting metadata +from. + + +
+Add New Service +
+ + +### 5. Configure the Service Connection + +In this step, we will configure the connection settings required for +this connector. Please follow the instructions below to ensure that +you've configured the connector to read from your dagster service as +desired. + +
+Configure service connection +
+ + +Once the credentials have been added, click on `Test Connection` and Save +the changes. + +
+Test Connection +
+ +#### Connection Options + +- **Dagster API Key**: Dagster API Key. +- **Dagster API Secret**: Dagster API Secret. + +### 6. Configure Metadata Ingestion + +In this step we will configure the metadata ingestion pipeline, +Please follow the instructions below + + + +#### Metadata Ingestion Options + +- **Name**: This field refers to the name of ingestion pipeline, you can customize the name or use the generated name. +- **Pipeline Filter Pattern (Optional)**: Use to pipeline filter patterns to control whether or not to include pipeline as part of metadata ingestion. + - **Include**: Explicitly include pipeline by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be excluded. + - **Exclude**: Explicitly exclude pipeline by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be included. +- **Include lineage (toggle)**: Set the Include lineage toggle to control whether or not to include lineage between pipelines and data sources as part of metadata ingestion. +- **Enable Debug Log (toggle)**: Set the Enable Debug Log toggle to set the default log level to debug, these logs can be viewed later in Airflow. + +### 7. Schedule the Ingestion and Deploy + +Scheduling can be set up at an hourly, daily, or weekly cadence. The +timezone is in UTC. Select a Start Date to schedule for ingestion. It is +optional to add an End Date. + +Review your configuration settings. If they match what you intended, +click Deploy to create the service and schedule metadata ingestion. + +If something doesn't look right, click the Back button to return to the +appropriate step and change the settings as needed. + + + +After configuring the workflow, you can click on Deploy to create the +pipeline. + +### 8. View the Ingestion Pipeline + +Once the workflow has been successfully deployed, you can view the +Ingestion Pipeline running from the Service Page. + + + +### 9. Workflow Deployment Error + +If there were any errors during the workflow deployment process, the +Ingestion Pipeline Entity will still be created, but no workflow will be +present in the Ingestion container. + +You can then edit the Ingestion Pipeline and Deploy it again. + + + +From the Connection tab, you can also Edit the Service if needed. diff --git a/openmetadata-docs/content/openmetadata/connectors/pipeline/index.md b/openmetadata-docs/content/openmetadata/connectors/pipeline/index.md index d3f6e490764..580ab7afe3c 100644 --- a/openmetadata-docs/content/openmetadata/connectors/pipeline/index.md +++ b/openmetadata-docs/content/openmetadata/connectors/pipeline/index.md @@ -9,3 +9,4 @@ slug: /openmetadata/connectors/pipeline - [Airflow](/openmetadata/connectors/pipeline/airflow) - [Glue](/openmetadata/connectors/pipeline/glue) - [Fivetran](/openmetadata/connectors/pipeline/fivetran) +- [Dagster](/openmetadata/connectors/pipeline/dagster) diff --git a/openmetadata-docs/images/openmetadata/connectors/dagster/add-new-service.png b/openmetadata-docs/images/openmetadata/connectors/dagster/add-new-service.png new file mode 100644 index 00000000000..9fce7a7d940 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/connectors/dagster/add-new-service.png differ diff --git a/openmetadata-docs/images/openmetadata/connectors/dagster/select-service.png b/openmetadata-docs/images/openmetadata/connectors/dagster/select-service.png new file mode 100644 index 00000000000..b55bb40c7e2 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/connectors/dagster/select-service.png differ diff --git a/openmetadata-docs/images/openmetadata/connectors/dagster/service-connection.png b/openmetadata-docs/images/openmetadata/connectors/dagster/service-connection.png new file mode 100644 index 00000000000..c543938f0e2 Binary files /dev/null and b/openmetadata-docs/images/openmetadata/connectors/dagster/service-connection.png differ