docs(airflow): add docs on custom operators (#7913)

Co-authored-by: John Joyce <john@acryl.io>
This commit is contained in:
matthew-coudert-cko 2023-06-15 08:16:25 +01:00 committed by GitHub
parent 66806a805e
commit 5e45db9e60
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -69,14 +69,41 @@ lazy_load_plugins = False
### How to validate installation
1. Go and check in Airflow at Admin -> Plugins menu if you can see the Datahub plugin
1. Go and check in Airflow at Admin -> Plugins menu if you can see the DataHub plugin
2. Run an Airflow DAG. In the task logs, you should see Datahub related log messages like:
```
Emitting Datahub ...
Emitting DataHub ...
```
## Using Datahub's Airflow lineage backend (deprecated)
### Emitting lineage via a custom operator to the Airflow Plugin
If you have created a custom Airflow operator [docs](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) that inherits from the BaseOperator class,
when overriding the `execute` function, set inlets and outlets via `context['ti'].task.inlets` and `context['ti'].task.outlets`.
The DataHub Airflow plugin will then pick up those inlets and outlets after the task runs.
```python
class DbtOperator(BaseOperator):
...
def execute(self, context):
# do something
inlets, outlets = self._get_lineage()
# inlets/outlets are lists of either datahub_provider.entities.Dataset or datahub_provider.entities.Urn
context['ti'].task.inlets = self.inlets
context['ti'].task.outlets = self.outlets
def _get_lineage(self):
# Do some processing to get inlets/outlets
return inlets, outlets
```
If you override the `pre_execute` and `post_execute` function, ensure they include the `@prepare_lineage` and `@apply_lineage` decorators respectively. [source](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/lineage.html#lineage)
## Using DataHub's Airflow lineage backend (deprecated)
:::caution
@ -145,6 +172,7 @@ Take a look at this sample DAG:
In order to use this example, you must first configure the Datahub hook. Like in ingestion, we support a Datahub REST hook and a Kafka-based hook. See step 1 above for details.
## Debugging
### Incorrect URLs