mirror of
https://github.com/open-metadata/OpenMetadata.git
synced 2025-07-12 19:48:26 +00:00
298 lines
8.7 KiB
Markdown
298 lines
8.7 KiB
Markdown
![]() |
---
|
||
|
title: Glue Pipeline
|
||
|
slug: /connectors/pipeline/glue-pipeline
|
||
|
---
|
||
|
|
||
|
# Glue Pipeline
|
||
|
|
||
|
In this section, we provide guides and references to use the Glue connector.
|
||
|
|
||
|
Configure and schedule Glue metadata and profiler workflows from the OpenMetadata UI:
|
||
|
|
||
|
- [Requirements](#requirements)
|
||
|
- [Metadata Ingestion](#metadata-ingestion)
|
||
|
|
||
|
If you don't want to use the OpenMetadata Ingestion container to configure the workflows via the UI, then you can check
|
||
|
the following docs to connect using Airflow SDK or with the CLI.
|
||
|
|
||
|
{% tilesContainer %}
|
||
|
{% tile
|
||
|
title="Ingest with Airflow"
|
||
|
description="Configure the ingestion using Airflow SDK"
|
||
|
link="/connectors/dashboard/glue/airflow"
|
||
|
/ %}
|
||
|
{% tile
|
||
|
title="Ingest with the CLI"
|
||
|
description="Run a one-time ingestion using the metadata CLI"
|
||
|
link="/connectors/dashboard/glue/cli"
|
||
|
/ %}
|
||
|
|
||
|
{% /tilesContainer %}
|
||
|
|
||
|
## Requirements
|
||
|
|
||
|
{%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%}
|
||
|
To deploy OpenMetadata, check the Deployment guides.
|
||
|
{% /inlineCallout %}
|
||
|
|
||
|
To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with
|
||
|
custom Airflow plugins to handle the workflow deployment.
|
||
|
|
||
|
|
||
|
## Metadata Ingestion
|
||
|
|
||
|
{% stepsContainer %}
|
||
|
|
||
|
{% step srNumber=1 %}
|
||
|
|
||
|
{% stepDescription title="1. Visit the Services Page" %}
|
||
|
|
||
|
The first step is ingesting the metadata from your sources. Under
|
||
|
Settings, you will find a Services link an external source system to
|
||
|
OpenMetadata. Once a service is created, it can be used to configure
|
||
|
metadata, usage, and profiler workflows.
|
||
|
|
||
|
To visit the Services page, select Services from the Settings menu.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/visit-services.png"
|
||
|
alt="Visit Services Page"
|
||
|
caption="Find Dashboard option on left panel of the settings page" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% step srNumber=2 %}
|
||
|
|
||
|
{% stepDescription title="2. Create a New Service" %}
|
||
|
|
||
|
Click on the 'Add New Service' button to start the Service creation.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/create-service.png"
|
||
|
alt="Create a new service"
|
||
|
caption="Add a new Service from the Dashboard Services page" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
|
||
|
|
||
|
{% step srNumber=3 %}
|
||
|
|
||
|
{% stepDescription title="3. Select the Service Type" %}
|
||
|
|
||
|
Select Glue Pipeline as the service type and click Next.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/glue/select-service.png"
|
||
|
alt="Select Service"
|
||
|
caption="Select your service from the list" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% step srNumber=4 %}
|
||
|
|
||
|
{% stepDescription title="4. Name and Describe your Service" %}
|
||
|
|
||
|
Provide a name and description for your service as illustrated below.
|
||
|
|
||
|
#### Service Name
|
||
|
|
||
|
OpenMetadata uniquely identifies services by their Service Name. Provide
|
||
|
a name that distinguishes your deployment from other services, including
|
||
|
the other {connector} services that you might be ingesting metadata
|
||
|
from.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/glue/add-new-service.png"
|
||
|
alt="Add New Service"
|
||
|
caption="Provide a Name and description for your Service" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% step srNumber=5 %}
|
||
|
|
||
|
{% stepDescription title="5. Configure the Service Connection" %}
|
||
|
|
||
|
In this step, we will configure the connection settings required for
|
||
|
this connector. Please follow the instructions below to ensure that
|
||
|
you've configured the connector to read from your glue service as
|
||
|
desired.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/glue/service-connection.png"
|
||
|
alt="Configure service connection"
|
||
|
caption="Configure the service connection by filling the form" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% extraContent parentTagName="stepsContainer" %}
|
||
|
|
||
|
#### Connection Options
|
||
|
|
||
|
- **AWS Access Key ID**: Enter your secure access key ID for your Glue connection. The specified key ID should be
|
||
|
authorized to read all databases you want to include in the metadata ingestion workflow.
|
||
|
- **AWS Secret Access Key**: Enter the Secret Access Key (the passcode key pair to the key ID from above).
|
||
|
- **AWS Region**: Enter the location of the amazon cluster that your data and account are associated with.
|
||
|
- **AWS Session Token (optional)**: The AWS session token is an optional parameter. If you want, enter the details of
|
||
|
your temporary session token.
|
||
|
- **Endpoint URL (optional)**: Your Glue connector will automatically determine the AWS Glue endpoint URL based on the
|
||
|
region. You may override this behavior by entering a value to the endpoint URL.
|
||
|
|
||
|
|
||
|
{% /extraContent %}
|
||
|
|
||
|
{% step srNumber=6 %}
|
||
|
|
||
|
{% stepDescription title="6. Test the Connection" %}
|
||
|
|
||
|
Once the credentials have been added, click on `Test Connection` and Save
|
||
|
the changes.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/test-connection.png"
|
||
|
alt="Test Connection"
|
||
|
caption="Test the connection and save the Service" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% step srNumber=7 %}
|
||
|
|
||
|
{% stepDescription title="7. Configure Metadata Ingestion" %}
|
||
|
|
||
|
In this step we will configure the metadata ingestion pipeline,
|
||
|
Please follow the instructions below
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/configure-metadata-ingestion-dashboard.png"
|
||
|
alt="Configure Metadata Ingestion"
|
||
|
caption="Configure Metadata Ingestion Page" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% extraContent parentTagName="stepsContainer" %}
|
||
|
|
||
|
#### Metadata Ingestion Options
|
||
|
|
||
|
- **Name**: This field refers to the name of ingestion pipeline, you can customize the name or use the generated name.
|
||
|
- **Pipeline Filter Pattern (Optional)**: Use to pipeline filter patterns to control whether or not to include pipeline as part of metadata ingestion.
|
||
|
- **Include**: Explicitly include pipeline by adding a list of comma-separated regular expressions to the Include field. OpenMetadata will include all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be excluded.
|
||
|
- **Exclude**: Explicitly exclude pipeline by adding a list of comma-separated regular expressions to the Exclude field. OpenMetadata will exclude all pipeline with names matching one or more of the supplied regular expressions. All other schemas will be included.
|
||
|
- **Include lineage (toggle)**: Set the Include lineage toggle to control whether or not to include lineage between pipelines and data sources as part of metadata ingestion.
|
||
|
- **Enable Debug Log (toggle)**: Set the Enable Debug Log toggle to set the default log level to debug, these logs can be viewed later in Airflow.
|
||
|
- **Mark Deleted Pipelines (toggle)**: Set the Mark Deleted Pipelines toggle to flag pipelines as soft-deleted if they are not present anymore in the source system.
|
||
|
|
||
|
{% /extraContent %}
|
||
|
|
||
|
{% step srNumber=8 %}
|
||
|
|
||
|
{% stepDescription title="8. Schedule the Ingestion and Deploy" %}
|
||
|
|
||
|
Scheduling can be set up at an hourly, daily, or weekly cadence. The
|
||
|
timezone is in UTC. Select a Start Date to schedule for ingestion. It is
|
||
|
optional to add an End Date.
|
||
|
|
||
|
Review your configuration settings. If they match what you intended,
|
||
|
click Deploy to create the service and schedule metadata ingestion.
|
||
|
|
||
|
If something doesn't look right, click the Back button to return to the
|
||
|
appropriate step and change the settings as needed.
|
||
|
|
||
|
After configuring the workflow, you can click on Deploy to create the
|
||
|
pipeline.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/schedule.png"
|
||
|
alt="Schedule the Workflow"
|
||
|
caption="Schedule the Ingestion Pipeline and Deploy" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
|
||
|
{% step srNumber=9 %}
|
||
|
|
||
|
{% stepDescription title="9. View the Ingestion Pipeline" %}
|
||
|
|
||
|
Once the workflow has been successfully deployed, you can view the
|
||
|
Ingestion Pipeline running from the Service Page.
|
||
|
|
||
|
{% /stepDescription %}
|
||
|
|
||
|
{% stepVisualInfo %}
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/view-ingestion-pipeline.png"
|
||
|
alt="View Ingestion Pipeline"
|
||
|
caption="View the Ingestion Pipeline from the Service Page" /%}
|
||
|
|
||
|
{% /stepVisualInfo %}
|
||
|
|
||
|
{% /step %}
|
||
|
|
||
|
{% /stepsContainer %}
|
||
|
|
||
|
## Troubleshooting
|
||
|
|
||
|
### Workflow Deployment Error
|
||
|
|
||
|
If there were any errors during the workflow deployment process, the
|
||
|
Ingestion Pipeline Entity will still be created, but no workflow will be
|
||
|
present in the Ingestion container.
|
||
|
|
||
|
- You can then edit the Ingestion Pipeline and Deploy it again.
|
||
|
|
||
|
- From the Connection tab, you can also Edit the Service if needed.
|
||
|
|
||
|
{% image
|
||
|
src="/images/v1.0.0/openmetadata/connectors/workflow-deployment-error.png"
|
||
|
alt="Workflow Deployment Error"
|
||
|
caption="Edit and Deploy the Ingestion Pipeline" /%}
|
||
|
|