docs(airflow): example query to get datajobs for a dataflow (#11034)

This commit is contained in:
Ellie O'Neil 2024-07-31 04:31:09 -07:00 committed by GitHub
parent 27e1130586
commit f73149a059
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 29 additions and 0 deletions

View File

@ -27,6 +27,7 @@ For more information on, please refer to the following links."
- [Querying for Domain of a Dataset](/docs/api/tutorials/domains.md#read-domains)
- [Querying for Glossary Terms of a Dataset](/docs/api/tutorials/terms.md#read-terms)
- [Querying for Deprecation of a dataset](/docs/api/tutorials/deprecation.md#read-deprecation)
- [Querying for all DataJobs that belong to a DataFlow](/docs/lineage/airflow.md#get-all-datajobs-associated-with-a-dataflow)
### Search

View File

@ -266,6 +266,34 @@ with DAG(
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`
## Get all dataJobs associated with a dataFlow
If you are looking to find all tasks (aka DataJobs) that belong to a specific pipeline (aka DataFlow), you can use the following GraphQL query:
```graphql
query {
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
childJobs: relationships(
input: {
types: ["IsPartOf"],
direction: INCOMING,
start: 0,
count: 100
}
) {
total
relationships {
entity {
... on DataJob {
urn
}
}
}
}
}
}
```
## Emit Lineage Directly
If you can't use the plugin or annotate inlets/outlets, you can also emit lineage using the `DatahubEmitterOperator`.