mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-29 19:07:33 +00:00
docs(airflow): update min version for plugin v2 (#11065)
This commit is contained in:
parent
66ecfae5e4
commit
2369032077
@ -17,7 +17,7 @@ There's two actively supported implementations of the plugin, with different Air
|
||||
|
||||
| Approach | Airflow Version | Notes |
|
||||
| --------- | --------------- | --------------------------------------------------------------------------- |
|
||||
| Plugin v2 | 2.3+ | Recommended. Requires Python 3.8+ |
|
||||
| Plugin v2 | 2.3.4+ | Recommended. Requires Python 3.8+ |
|
||||
| Plugin v1 | 2.1+ | No automatic lineage extraction; may not extract lineage if the task fails. |
|
||||
|
||||
If you're using Airflow older than 2.1, it's possible to use the v1 plugin with older versions of `acryl-datahub-airflow-plugin`. See the [compatibility section](#compatibility) for more details.
|
||||
@ -66,7 +66,7 @@ enabled = True # default
|
||||
```
|
||||
|
||||
| Name | Default value | Description |
|
||||
|----------------------------|----------------------|------------------------------------------------------------------------------------------|
|
||||
| -------------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| enabled | true | If the plugin should be enabled. |
|
||||
| conn_id | datahub_rest_default | The name of the datahub rest connection. |
|
||||
| cluster | prod | name of the airflow cluster, this is equivalent to the `env` of the instance |
|
||||
@ -132,7 +132,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
|
||||
```
|
||||
|
||||
| Name | Default value | Description |
|
||||
|----------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| -------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| enabled | true | If the plugin should be enabled. |
|
||||
| conn_id | datahub_rest_default | The name of the datahub connection you set in step 1. |
|
||||
| cluster | prod | name of the airflow cluster |
|
||||
@ -240,6 +240,7 @@ See this [example PR](https://github.com/datahub-project/datahub/pull/10452) whi
|
||||
There might be a case where the DAGs are removed from the Airflow but the corresponding pipelines and tasks are still there in the Datahub, let's call such pipelines ans tasks, `obsolete pipelines and tasks`
|
||||
|
||||
Following are the steps to cleanup them from the datahub:
|
||||
|
||||
- create a DAG named `Datahub_Cleanup`, i.e.
|
||||
|
||||
```python
|
||||
@ -263,8 +264,8 @@ with DAG(
|
||||
)
|
||||
|
||||
```
|
||||
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
|
||||
|
||||
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
|
||||
|
||||
## Get all dataJobs associated with a dataFlow
|
||||
|
||||
@ -274,12 +275,7 @@ If you are looking to find all tasks (aka DataJobs) that belong to a specific pi
|
||||
query {
|
||||
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
|
||||
childJobs: relationships(
|
||||
input: {
|
||||
types: ["IsPartOf"],
|
||||
direction: INCOMING,
|
||||
start: 0,
|
||||
count: 100
|
||||
}
|
||||
input: { types: ["IsPartOf"], direction: INCOMING, start: 0, count: 100 }
|
||||
) {
|
||||
total
|
||||
relationships {
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user