mirror of
https://github.com/datahub-project/datahub.git
synced 2025-10-29 17:59:24 +00:00
docs(airflow): example query to get datajobs for a dataflow (#11034)
This commit is contained in:
parent
27e1130586
commit
f73149a059
@ -27,6 +27,7 @@ For more information on, please refer to the following links."
|
||||
- [Querying for Domain of a Dataset](/docs/api/tutorials/domains.md#read-domains)
|
||||
- [Querying for Glossary Terms of a Dataset](/docs/api/tutorials/terms.md#read-terms)
|
||||
- [Querying for Deprecation of a dataset](/docs/api/tutorials/deprecation.md#read-deprecation)
|
||||
- [Querying for all DataJobs that belong to a DataFlow](/docs/lineage/airflow.md#get-all-datajobs-associated-with-a-dataflow)
|
||||
|
||||
### Search
|
||||
|
||||
|
||||
@ -266,6 +266,34 @@ with DAG(
|
||||
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
|
||||
|
||||
|
||||
## Get all dataJobs associated with a dataFlow
|
||||
|
||||
If you are looking to find all tasks (aka DataJobs) that belong to a specific pipeline (aka DataFlow), you can use the following GraphQL query:
|
||||
|
||||
```graphql
|
||||
query {
|
||||
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
|
||||
childJobs: relationships(
|
||||
input: {
|
||||
types: ["IsPartOf"],
|
||||
direction: INCOMING,
|
||||
start: 0,
|
||||
count: 100
|
||||
}
|
||||
) {
|
||||
total
|
||||
relationships {
|
||||
entity {
|
||||
... on DataJob {
|
||||
urn
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Emit Lineage Directly
|
||||
|
||||
If you can't use the plugin or annotate inlets/outlets, you can also emit lineage using the `DatahubEmitterOperator`.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user