docs(airflow): update min version for plugin v2 (#11065)

This commit is contained in:
Harshal Sheth 2024-08-01 10:05:49 -07:00 committed by GitHub
parent 66ecfae5e4
commit 2369032077
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -17,7 +17,7 @@ There's two actively supported implementations of the plugin, with different Air
| Approach | Airflow Version | Notes |
| --------- | --------------- | --------------------------------------------------------------------------- |
| Plugin v2 | 2.3+ | Recommended. Requires Python 3.8+ |
| Plugin v2 | 2.3.4+ | Recommended. Requires Python 3.8+ |
| Plugin v1 | 2.1+ | No automatic lineage extraction; may not extract lineage if the task fails. |
If you're using Airflow older than 2.1, it's possible to use the v1 plugin with older versions of `acryl-datahub-airflow-plugin`. See the [compatibility section](#compatibility) for more details.
@ -66,7 +66,7 @@ enabled = True # default
```
| Name | Default value | Description |
|----------------------------|----------------------|------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub rest connection. |
| cluster | prod | name of the airflow cluster, this is equivalent to the `env` of the instance |
@ -132,7 +132,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
```
| Name | Default value | Description |
|----------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub connection you set in step 1. |
| cluster | prod | name of the airflow cluster |
@ -240,6 +240,7 @@ See this [example PR](https://github.com/datahub-project/datahub/pull/10452) whi
There might be a case where the DAGs are removed from the Airflow but the corresponding pipelines and tasks are still there in the Datahub, let's call such pipelines ans tasks, `obsolete pipelines and tasks`
Following are the steps to cleanup them from the datahub:
- create a DAG named `Datahub_Cleanup`, i.e.
```python
@ -263,8 +264,8 @@ with DAG(
)
```
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
## Get all dataJobs associated with a dataFlow
@ -274,12 +275,7 @@ If you are looking to find all tasks (aka DataJobs) that belong to a specific pi
query {
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
childJobs: relationships(
input: {
types: ["IsPartOf"],
direction: INCOMING,
start: 0,
count: 100
}
input: { types: ["IsPartOf"], direction: INCOMING, start: 0, count: 100 }
) {
total
relationships {