This guide is currently unmaintained. As of 0.10.0 the container described is not published alongside the DataHub CLI. If you'd like to use it, please reach out to us on the [community slack.](docs/slack.md)
This document describes how you can run Airflow side-by-side with DataHub's quickstart docker images to test out Airflow lineage with DataHub.
This offers a much easier way to try out Airflow with DataHub, compared to configuring containers by hand, setting up configurations and networking connectivity between the two systems.
- This docker-compose file is derived from the [official Airflow docker-compose file](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#docker-compose-yaml) but makes a few critical changes to make interoperability with DataHub seamless.
- The Airflow image in this docker compose file extends the [base Apache Airflow docker image](https://airflow.apache.org/docs/docker-stack/index.html) and is published [here](https://hub.docker.com/r/acryldata/airflow-datahub). It includes the latest `acryl-datahub` pip package installed by default so you don't need to install it yourself.
- This docker-compose file sets up the networking so that
- the Airflow containers can talk to the DataHub containers through the `datahub_network` bridge interface.
- Modifies the port-forwarding to map the Airflow Webserver port `8080` to port `58080` on the localhost (to avoid conflicts with DataHub metadata-service, which is mapped to 8080 by default)
- This docker-compose file also sets up the ENV variables to configure Airflow's Lineage Backend to talk to DataHub. (Look for the `AIRFLOW__LINEAGE__BACKEND` and `AIRFLOW__LINEAGE__DATAHUB_KWARGS` variables)
## Step 2: Bring up Airflow
First you need to initialize airflow in order to create initial database tables and the initial airflow user.
airflow-worker_1 | WARNING: This is a development server. Do not use it in a production deployment.
airflow-worker_1 | Use a production WSGI server instead.
airflow-worker_1 | * Debug mode: off
airflow-worker_1 | [2021-08-31 20:02:09,283] {_internal.py:113} INFO - * Running on http://0.0.0.0:8793/ (Press CTRL+C to quit)
flower_1 | BACKEND=redis
flower_1 | DB_HOST=redis
flower_1 | DB_PORT=6379
flower_1 |
```
Finally, Airflow should be healthy and up on port 58080. Navigate to [http://localhost:58080](http://localhost:58080) to confirm and find your Airflow webserver.
- Running the `airflow connections add ...` command inside that container to register the `datahub_rest` connection type and connect it to the `datahub-gms` host on port 8080.
- Note: This is what requires Airflow to be able to connect to `datahub-gms` the host (this is the container running datahub-gms image) and this is why we needed to connect the Airflow containers to the `datahub_network` using our custom docker-compose file.
In this case, clearly the connection `datahub-rest` has not been registered. Looks like we forgot to register the connection with Airflow!
Let's execute Step 4 to register the datahub connection with Airflow.
In case the connection was registered successfully but you are still seeing `Failed to establish a new connection`, check if the connection is `http://datahub-gms:8080` and not `http://localhost:8080`.