Out of the box, OpenMetadata comes with such integration with Airflow. In this guide, we will show you how to manage
ingestions from OpenMetadata by linking it to an Airflow service.
{% note %}
Advanced note for developers: We have an [interface]( https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/java/org/openmetadata/sdk/PipelineServiceClient.java)
that can be extended to bring support to any other orchestrator. You can follow the implementation we have for [Airflow]( https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-service/src/main/java/org/openmetadata/service/clients/pipeline/airflow/AirflowRESTClient.java)
as a starting point.
{% /note %}
1.**If you do not have an Airflow service** up and running on your platform, we provide a custom
[Docker](https://hub.docker.com/r/openmetadata/ingestion) image, which already contains the OpenMetadata ingestion
packages and custom [Airflow APIs](https://github.com/open-metadata/openmetadata-airflow-apis) to
deploy Workflows from the UI as well. **This is the simplest approach**.
2. If you already have Airflow up and running and want to use it for the metadata ingestion, you will
need to install the ingestion modules to the host. You can find more information on how to do this
in the Custom Airflow Installation section.
## Airflow permissions
These are the permissions required by the user that will manage the communication between the OpenMetadata Server
`User` permissions is enough for these requirements.
You can find more information on Airflow's Access Control [here](https://airflow.apache.org/docs/apache-airflow/stable/security/access-control.html).
## Shared Volumes
{% note noteType="Warning" %}
The Airflow Webserver, Scheduler and Workers - if using a distributed setup - need to have access to the same shared volumes
with RWX permissions.
{% /note %}
We have specific instructions on how to set up the shared volumes in Kubernetes depending on your cloud deployment [here](/deployment/kubernetes).
## Using the OpenMetadata Ingestion Image
If you are using our `openmetadata/ingestion` Docker image, there is just one thing to do: Configure the OpenMetadata server.
The OpenMetadata server takes all its configurations from a YAML file. You can find them in our [repo](https://github.com/open-metadata/OpenMetadata/tree/main/conf). In
`openmetadata.yaml`, update the `pipelineServiceClientConfiguration` section accordingly.
- - The supported Airflow versions for OpenMetadata include 2.3, 2.4, 2.5, 2.6, and 2.7. Starting from release 1.6, OpenMetadata supports compatibility with Airflow versions up to 2.10.5. Specifically, OpenMetadata 1.5 supports Airflow 2.9, 1.6.4 supports Airflow 2.9.3, and 1.6.5 supports Airflow 2.10.5. Ensure that your Airflow version aligns with your OpenMetadata deployment for optimal performance.
And then run the DAG as explained in each [Connector](/connectors), where `x.y.z` is the same version of your
OpenMetadata server. For example, if you are on version 1.0.0, then you can install the `openmetadata-ingestion`
with versions `1.0.0.*`, e.g., `1.0.0.0`, `1.0.0.1`, etc., but not `1.0.1.x`.
{% note %}
You can also install `openmetadata-ingestion[all]==x.y.z`, which will bring the requirements to run any connector.
{% /note %}
You can check the [Connector Modules](/connectors) guide above to learn how to install the `openmetadata-ingestion` package with the
necessary plugins. They are necessary because even if we install the APIs, the Airflow instance needs to have the
required libraries to connect to each source.
### 2. Install the Airflow APIs
{% note %}
The `openmetadata-ingestion-apis` has a dependency on `apache-airflow>=2.2.2`. Please make sure that
your host satisfies such requirement. Only installing the `openmetadata-ingestion-apis` won't result
in a proper full Airflow installation. For that, please follow the Airflow [docs](https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html).
{% /note %}
The goal of this module is to add some HTTP endpoints that the UI calls for deploying the Airflow DAGs.
The first step can be achieved by running:
```python
pip3 install "openmetadata-managed-apis==x.y.z"
```
Here, the same versioning logic applies: `x.y.z` is the same version of your
OpenMetadata server. For example, if you are on version 1.0.0, then you can install the `openmetadata-managed-apis`
with versions `1.0.0.*`, e.g., `1.0.0.0`, `1.0.0.1`, etc., but not `1.0.1.x`.
The ingestion image is built on Airflow's base image, ensuring it includes all necessary requirements to run Airflow. For Kubernetes deployments, the setup uses community Airflow charts with a modified base image, enabling it to function seamlessly as a **scheduler**, **webserver**, and **worker**.
After installing the Airflow APIs, you will need to update your OpenMetadata Server.
The OpenMetadata server takes all its configurations from a YAML file. You can find them in our [repo](https://github.com/open-metadata/OpenMetadata/tree/main/conf). In
`openmetadata.yaml`, update the `pipelineServiceClientConfiguration` section accordingly.
One recurrent question when setting up Airflow is the possibility of using [git-sync](https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html#mounting-dags-from-a-private-github-repo-using-git-sync-sidecar)
to manage the ingestion DAGs.
Let's remark the differences between `git-sync` and what we want to achieve by installing our custom API plugins:
1.`git-sync` will use Git as the source of truth for your DAGs. Meaning, any DAG you have on Git will eventually be used and scheduled in Airflow.
2. With the `openmetadata-managed-apis` we are using the OpenMetadata server as the source of truth. We are enabling dynamic DAG
creation from the OpenMetadata into your Airflow instance every time that you create a new Ingestion Workflow.
Then, should you use `git-sync`?
- If you have an existing Airflow instance, and you want to build and maintain your own ingestion DAGs then you can go for it. Check a DAG example [here](/deployment/ingestion/external/airflow#example).
- If instead, you want to use the full deployment process from OpenMetadata, `git-sync` would not be the right tool, since the DAGs won't be backed up by Git, but rather created from OpenMetadata. Note that if anything
would to happen where you might lose the Airflow volumes, etc. You can just redeploy the DAGs from OpenMetadata.
## SSL
If you want to learn how to set up Airflow using SSL, you can learn more here:
{% inlineCalloutContainer %}
{% inlineCallout
color="violet-70"
icon="luggage"
bold="Airflow SSL"
href="/deployment/security/enable-ssl/airflow" %}
Learn how to configure Airflow with SSL.
{% /inlineCallout %}
{% /inlineCalloutContainer %}
# Troubleshooting
## Ingestion Pipeline deployment issues
### Airflow APIs Not Found
Validate the installation, making sure that from the OpenMetadata server you can reach the Airflow host, and the