
* Rename docs and clean SSO * Add connector partials * Add connector partials * Rename path * Add upgrade prerequisites * Prep ingestion deployments * Add GitHub Actions * Fix img * Fix link
4.6 KiB
title | slug |
---|---|
Ingestion Framework Deployment | /deployment/ingestion |
Ingestion Framework Deployment
The Ingestion Framework is the module that takes care of bringing metadata in to OpenMetadata. It is used for any type of workflow that is supported in the platform: Metadata, Lineage, Usage, Profiler, Data Quality,...
Manage & Schedule the Ingestion Framework
In this guide, we will present the different alternatives to run and manage your ingestion workflows. There are mainly 2 ways of running the ingestion:
- Internally, by managing the workflows from OpenMetadata.
- Externally, by using any other tool capable or running Python code.
Option 1 - From OpenMetadata
If you want to learn how to configure your setup to run them from OpenMetadata, follow this guide:
{% inlineCalloutContainer %} {% inlineCallout color="violet-70" icon="10k" bold="OpenMetadata UI" href="/deployment/ingestion/openmetadata" %} Deploy, configure and manage the ingestion workflows directly from the OpenMetadata UI. {% /inlineCallout %} {% /inlineCalloutContainer %}
Option 2 - Externally
If, instead, you want to manage them from any other system, you would need a bit more background:
- How does the Ingestion Framework work?
- Ingestion Configuration
1. How does the Ingestion Framework work?
The Ingestion Framework contains all the logic about how to connect to the sources, extract their metadata and send it to the OpenMetadata server. We have built it from scratch with the main idea of making it an independent component that can be run from - literally - anywhere.
In order to install it, you just need to get it from PyPI.
We will show further examples later, but a piece of code is the best showcase for its simplicity. In order to run a full ingestion process, you just need to execute a single function:
def run():
workflow_config = yaml.safe_load(CONFIG)
workflow = Workflow.create(workflow_config)
workflow.execute()
workflow.raise_from_status()
workflow.print_status()
workflow.stop()
Where this function runs is completely up to you, and you can adapt it to what makes the most sense within your organization and engineering context.
2. Ingestion Configuration
In the example above, the Workflow
class got created from a YAML configuration. Any Workflow that you execute (ingestion,
profiler, lineage,...) will have its own YAML representation.
You can think about this configuration as the recipe you want to execute: where is your source, which pieces do you extract, how are they processed and where are they sent.
An example YAML config for extracting MySQL metadata looks like this:
source:
type: mysql
serviceName: mysql
serviceConnection:
config:
type: Mysql
username: openmetadata_user
authType:
password: openmetadata_password
hostPort: localhost:3306
databaseSchema: openmetadata_db
sourceConfig:
config:
type: DatabaseMetadata
sink:
type: metadata-rest
config: {}
workflowConfig:
openMetadataServerConfig:
hostPort: 'http://localhost:8585/api'
authProvider: openmetadata
securityConfig:
jwtToken: ...
If you need to get the YAML shape of any connector, you can pick it up from its doc page.
Examples
{% note %}
This is not an exhaustive list, and it will keep growing over time. Not because the orchestrators X or Y are not supported, but just because we did not have the time yet to add it here. If you'd like to chip in and help us expand these guides and examples, don't hesitate to reach to us in Slack or directly open a PR in GitHub.
{% /note %}
{% inlineCalloutContainer %} {% inlineCallout color="violet-70" icon="10k" bold="Airflow" href="/deployment/ingestion/airflow" %} Run the ingestion process externally from Airflow {% /inlineCallout %} {% inlineCallout color="violet-70" icon="10k" bold="MWAA" href="/deployment/ingestion/mwaa" %} Run the ingestion process externally using AWS MWAA {% /inlineCallout %} {% inlineCallout color="violet-70" icon="10k" bold="GCS Composer" href="/deployment/ingestion/gcs-composer" %} Run the ingestion process externally from GCS Composer {% /inlineCallout %} {% inlineCallout color="violet-70" icon="10k" bold="GitHub Actions" href="/deployment/ingestion/github-actions" %} Run the ingestion process externally from GitHub Actions {% /inlineCallout %} {% /inlineCalloutContainer %}