mirror of
https://github.com/datahub-project/datahub.git
synced 2025-10-24 07:24:58 +00:00
1077 lines
26 KiB
Markdown
1077 lines
26 KiB
Markdown
import Tabs from '@theme/Tabs';
|
||
import TabItem from '@theme/TabItem';
|
||
|
||
# GraphQL Best Practices
|
||
|
||
## Introduction:
|
||
|
||
DataHub’s GraphQL API is designed to power the UI. The following guidelines are written with this use-case in mind.
|
||
|
||
## General Best Practices
|
||
|
||
### Query Optimizations
|
||
|
||
> One of GraphQL's biggest advantages over a traditional REST API is its support for **declarative data fetching**. Each component can (and should) query exactly the fields it requires to render, with no superfluous data sent over the network. If instead your root component executes a single, enormous query to obtain data for all of its children, it might query on behalf of components that _aren't even rendered_ given the current state. This can result in a delayed response, and it drastically reduces the likelihood that the query's result can be reused by a **server-side response cache**. [[ref](https://www.apolloGraphQL.com/docs/react/data/operation-best-practices#query-only-the-data-you-need-where-you-need-it)]
|
||
|
||
1. Minimize over-fetching by only requesting data needed to be displayed.
|
||
2. Limit result counts and use pagination (additionally see section below on `Deep Pagination`).
|
||
3. Avoid deeply nested queries and instead break out queries into separate requests for the nested objects.
|
||
|
||
### Client-side Caching
|
||
|
||
Clients, such as Apollo Client (javascript, python `apollo-client-python`), offer [client-side caching](https://www.apolloGraphQL.com/docs/react/caching/overview) to prevent requests to the service and are able to understand the content of the GraphQL query. This enables more advanced caching vs HTTP response caching.
|
||
|
||
### Reuse Pieces of Query Logic with Fragments
|
||
|
||
One powerful feature of GraphQL that we recommend you use is [fragments](https://hygraph.com/learn/GraphQL/fragments). Fragments allow you to define pieces of a query that you can reuse across any client-side query that you define. Basically, you can define a set of fields that you want to query, and reuse it in multiple places.
|
||
|
||
This technique makes maintaining your GraphQL queries much more doable. For example, if you want to request a new field for an entity type across many queries, you’re able to update it in one place if you’re leveraging fragments.
|
||
|
||
## Search Query Best Practices
|
||
|
||
### Deep Pagination: search* vs scroll* APIs
|
||
|
||
`search*` APIs such as [`searchAcrossEntities`](https://docs.datahub.com/docs/GraphQL/queries/#searchacrossentities) are designed for minimal pagination (< ~50). They do not perform well for deep pagination requests. Use the equivalent `scroll*` APIs such as [`scrollAcrossEntities`](https://docs.datahub.com/docs/GraphQL/queries/#scrollacrossentities) when expecting the need to paginate deeply into the result set.
|
||
|
||
:::note
|
||
It is impossible to use `search*` for paginating beyond 10k results.
|
||
:::
|
||
|
||
:::caution
|
||
In order to `scroll*` through the entire result set it is required to use a stable sort order. This means using `_score` as
|
||
the first sort order cannot be used. Use the `urn` field as the sort order instead.
|
||
:::
|
||
|
||
#### Examples
|
||
|
||
In the following examples we demonstrate pagination for both `scroll*` and `search*` requests. This particular request is searching for two entities, Datasets and Charts, that
|
||
contain `pet` in the entities' name or title. The results will only include the URN for the entities.
|
||
|
||
<Tab>
|
||
<TabItem value="Scroll" label="Scroll" default>
|
||
Page 1 Request:
|
||
|
||
```graphql
|
||
{
|
||
scrollAcrossEntities(
|
||
input: {
|
||
types: [DATASET, CHART]
|
||
count: 2
|
||
query: "*"
|
||
orFilters: [
|
||
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
|
||
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
|
||
]
|
||
sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
|
||
}
|
||
) {
|
||
nextScrollId
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
... on Chart {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Page 1 Result:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"scrollAcrossEntities": {
|
||
"nextScrollId": "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0=",
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
|
||
}
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
Page 2 Request:
|
||
|
||
```graphql
|
||
{
|
||
scrollAcrossEntities(
|
||
input: {
|
||
scrollId: "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0="
|
||
types: [DATASET, CHART]
|
||
count: 2
|
||
query: "*"
|
||
orFilters: [
|
||
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
|
||
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
|
||
]
|
||
sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
|
||
}
|
||
) {
|
||
nextScrollId
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
... on Chart {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Page 2 Result:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"scrollAcrossEntities": {
|
||
"nextScrollId": "eyJzb3J0IjpbMS43MTg3NSwidXJuOmxpOmRhdGFzZXQ6KHVybjpsaTpkYXRhUGxhdGZvcm06c25vd2ZsYWtlLGxvbmdfdGFpbF9jb21wYW5pb25zLmFkb3B0aW9uLnBldHMsUFJPRCkiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ==",
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
|
||
}
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="Search" label="Search">
|
||
Page 1 Request:
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossEntities(
|
||
input: {
|
||
types: [DATASET, CHART]
|
||
count: 2
|
||
start: 0
|
||
query: "*"
|
||
orFilters: [
|
||
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
|
||
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
|
||
]
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
... on Chart {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Page 1 Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossEntities": {
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
|
||
}
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
Page 2 Request:
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossEntities(
|
||
input: {
|
||
types: [DATASET, CHART]
|
||
count: 2
|
||
start: 2
|
||
query: "*"
|
||
orFilters: [
|
||
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
|
||
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
|
||
]
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
... on Chart {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Page 2 Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossEntities": {
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
|
||
}
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tab>
|
||
|
||
### SearchFlags: Highlighting and Aggregation
|
||
|
||
When performing queries which accept [`searchFlags`](https://docs.datahub.com/docs/GraphQL/inputObjects#searchflags) and highlighting and aggregation is not needed, be sure to disable these flags.
|
||
|
||
- skipHighlighting: true
|
||
- skipAggregates: true
|
||
|
||
As a fallback, if only certain fields require highlighting use `customHighlightingFields` to limit highlighting to the specific fields.
|
||
|
||
<Tab>
|
||
<TabItem value="Skip" label="Skip Example" default>
|
||
|
||
Example for skipping highlighting and aggregates, typically used for scrolling search requests.
|
||
|
||
```graphql
|
||
{
|
||
scrollAcrossEntities(
|
||
input: {
|
||
types: [DATASET]
|
||
count: 2
|
||
query: "pet"
|
||
searchFlags: { skipAggregates: true, skipHighlighting: true }
|
||
sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
}
|
||
matchedFields {
|
||
name
|
||
value
|
||
}
|
||
}
|
||
facets {
|
||
displayName
|
||
aggregations {
|
||
value
|
||
count
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Response:
|
||
|
||
Note that a few `matchedFields` are still returned by default [`urn`, `customProperties`]
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"scrollAcrossEntities": {
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
|
||
},
|
||
"matchedFields": [
|
||
{
|
||
"name": "urn",
|
||
"value": ""
|
||
},
|
||
{
|
||
"name": "customProperties",
|
||
"value": ""
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
|
||
},
|
||
"matchedFields": [
|
||
{
|
||
"name": "urn",
|
||
"value": ""
|
||
},
|
||
{
|
||
"name": "customProperties",
|
||
"value": ""
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"facets": []
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="Custom" label="Custom Highlighting">
|
||
|
||
Custom highlighting can be used for searchAcrossEntities when only a limited number of fields are useful for highlighting. In this example we specifically request highlighting for `description`.
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossEntities(
|
||
input: {
|
||
types: [DATASET]
|
||
count: 2
|
||
query: "pet"
|
||
searchFlags: { customHighlightingFields: ["description"] }
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
}
|
||
matchedFields {
|
||
name
|
||
value
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossEntities": {
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
|
||
},
|
||
"matchedFields": [
|
||
{
|
||
"name": "urn",
|
||
"value": ""
|
||
},
|
||
{
|
||
"name": "customProperties",
|
||
"value": ""
|
||
},
|
||
{
|
||
"name": "description",
|
||
"value": "Table with all pet-related details"
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
|
||
},
|
||
"matchedFields": [
|
||
{
|
||
"name": "urn",
|
||
"value": ""
|
||
},
|
||
{
|
||
"name": "customProperties",
|
||
"value": ""
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tab>
|
||
|
||
### Aggregation
|
||
|
||
When aggregation is required with `searchAcrossEntities`, it is possible to set the `count` to 0 to avoid fetching the top search hits, only returning the aggregations. Alternatively [aggregateAcrossEntities](https://docs.datahub.com/docs/GraphQL/queries#aggregateacrossentities) provides counts and can provide faster results from server-side caching.
|
||
|
||
Request:
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossEntities(
|
||
input: {
|
||
types: [DATASET]
|
||
count: 0
|
||
query: "pet"
|
||
searchFlags: { skipHighlighting: true }
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
}
|
||
matchedFields {
|
||
name
|
||
value
|
||
}
|
||
}
|
||
facets {
|
||
displayName
|
||
aggregations {
|
||
value
|
||
count
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossEntities": {
|
||
"searchResults": [],
|
||
"facets": [
|
||
{
|
||
"displayName": "Container",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:container:b41c14bc5cb3ccfbb0433c8cbdef2992",
|
||
"count": 4
|
||
},
|
||
{
|
||
"value": "urn:li:container:701919de0ec93cb338fe9bac0b35403c",
|
||
"count": 3
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Sub Type",
|
||
"aggregations": [
|
||
{
|
||
"value": "table",
|
||
"count": 9
|
||
},
|
||
{
|
||
"value": "view",
|
||
"count": 6
|
||
},
|
||
{
|
||
"value": "explore",
|
||
"count": 5
|
||
},
|
||
{
|
||
"value": "source",
|
||
"count": 4
|
||
},
|
||
{
|
||
"value": "incremental",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Type",
|
||
"aggregations": [
|
||
{
|
||
"value": "DATASET",
|
||
"count": 24
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Environment",
|
||
"aggregations": [
|
||
{
|
||
"value": "PROD",
|
||
"count": 24
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Glossary Term",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:glossaryTerm:Adoption.DaysInStatus",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:glossaryTerm:Ecommerce.HighRisk",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:glossaryTerm:Classification.Confidential",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Domain",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:domain:094dc54b-0ebc-40a6-a4cf-e1b75e8b8089",
|
||
"count": 6
|
||
},
|
||
{
|
||
"value": "urn:li:domain:7d64d0fa-66c3-445c-83db-3a324723daf8",
|
||
"count": 2
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Owned By",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:corpGroup:Adoption",
|
||
"count": 5
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:shannon@longtail.com",
|
||
"count": 4
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:admin",
|
||
"count": 2
|
||
},
|
||
{
|
||
"value": "urn:li:corpGroup:Analytics Engineering",
|
||
"count": 2
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:avigdor@longtail.com",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:prentiss@longtail.com",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:tasha@longtail.com",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:ricca@longtail.com",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:corpuser:emilee@longtail.com",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Platform",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:dataPlatform:looker",
|
||
"count": 8
|
||
},
|
||
{
|
||
"value": "urn:li:dataPlatform:dbt",
|
||
"count": 7
|
||
},
|
||
{
|
||
"value": "urn:li:dataPlatform:snowflake",
|
||
"count": 7
|
||
},
|
||
{
|
||
"value": "urn:li:dataPlatform:s3",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:dataPlatform:mongodb",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Tag",
|
||
"aggregations": [
|
||
{
|
||
"value": "urn:li:tag:prod_model",
|
||
"count": 3
|
||
},
|
||
{
|
||
"value": "urn:li:tag:pii",
|
||
"count": 2
|
||
},
|
||
{
|
||
"value": "urn:li:tag:business critical",
|
||
"count": 2
|
||
},
|
||
{
|
||
"value": "urn:li:tag:business_critical",
|
||
"count": 2
|
||
},
|
||
{
|
||
"value": "urn:li:tag:Tier1",
|
||
"count": 1
|
||
},
|
||
{
|
||
"value": "urn:li:tag:prod",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"displayName": "Type",
|
||
"aggregations": [
|
||
{
|
||
"value": "DATASET",
|
||
"count": 24
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
### Limit Search Entity Types
|
||
|
||
When querying for specific entities, enumerate only the entity types required using `types` , for example [`DATASET` , `CHART`]
|
||
|
||
### Limit Results
|
||
|
||
Limit search results based on the amount of information being requested. For example, a minimal number of attributes can fetch 1,000 - 2,000 results in a single page, however as the number of attributes increases (especially nested objects) the `count` should be lowered, 20-25 for very complex requests.
|
||
|
||
## Lineage Query Best Practices
|
||
|
||
There are two primary ways to query lineage:
|
||
|
||
### Search Across Lineage
|
||
|
||
`searchAcrossLineage` / `scrollAcrossLineage` root query:
|
||
|
||
- Recommended for all lineage queries
|
||
- Only the shortest path is guaranteed to show up in `paths`
|
||
- Supports querying indirect lineage (depth > 1)
|
||
- Depending on the fanout of the lineage, 3+ hops may not return data, use 1-hop queries for the fastest response times.
|
||
- Specify using a filter with name `"degree"` and values `"1"` , `"2"`, and / or `"3+"`
|
||
|
||
The following examples are demonstrated using sample data for `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
|
||
|
||
<p align="center">
|
||
<img width="90%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/api/graphql/graphql-best-practices/sample_lineage.png"/>
|
||
</p>
|
||
|
||
<Tab>
|
||
<TabItem value="Upstream1" label="1-Hop Upstream">
|
||
|
||
The following example queries show UPSTREAM lineage with progressively higher degrees, first with degree `["1"]` and then `["1","2"]`.
|
||
|
||
1-Hop Upstreams:
|
||
|
||
Request:
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossLineage(
|
||
input: {
|
||
urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
|
||
query: "*"
|
||
count: 10
|
||
start: 0
|
||
direction: UPSTREAM
|
||
orFilters: [
|
||
{ and: [{ field: "degree", condition: EQUAL, values: ["1"] }] }
|
||
]
|
||
searchFlags: { skipAggregates: true, skipHighlighting: true }
|
||
}
|
||
) {
|
||
start
|
||
count
|
||
total
|
||
searchResults {
|
||
entity {
|
||
urn
|
||
type
|
||
... on Dataset {
|
||
name
|
||
}
|
||
}
|
||
paths {
|
||
path {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
degree
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossLineage": {
|
||
"start": 0,
|
||
"count": 10,
|
||
"total": 1,
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
|
||
"type": "DATASET",
|
||
"name": "SampleHdfsDataset"
|
||
},
|
||
"paths": [
|
||
{
|
||
"path": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
|
||
},
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"degree": 1
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="Upstream2" label="2-Hop Upstream">
|
||
1-Hop & 2-Hop Upstreams:
|
||
|
||
Request:
|
||
|
||
```graphql
|
||
{
|
||
searchAcrossLineage(
|
||
input: {
|
||
urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
|
||
query: "*"
|
||
count: 10
|
||
start: 0
|
||
direction: UPSTREAM
|
||
orFilters: [
|
||
{ and: [{ field: "degree", condition: EQUAL, values: ["1", "2"] }] }
|
||
]
|
||
searchFlags: { skipAggregates: true, skipHighlighting: true }
|
||
}
|
||
) {
|
||
start
|
||
count
|
||
total
|
||
searchResults {
|
||
entity {
|
||
urn
|
||
type
|
||
... on Dataset {
|
||
name
|
||
}
|
||
}
|
||
paths {
|
||
path {
|
||
... on Dataset {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
degree
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"searchAcrossLineage": {
|
||
"start": 0,
|
||
"count": 10,
|
||
"total": 2,
|
||
"searchResults": [
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
|
||
"type": "DATASET",
|
||
"name": "SampleHdfsDataset"
|
||
},
|
||
"paths": [
|
||
{
|
||
"path": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
|
||
},
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"degree": 1
|
||
},
|
||
{
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
|
||
"type": "DATASET",
|
||
"name": "SampleKafkaDataset"
|
||
},
|
||
"paths": [
|
||
{
|
||
"path": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
|
||
},
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
|
||
},
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"degree": 2
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tab>
|
||
|
||
### Lineage Subquery
|
||
|
||
The previous query requires a root or starting node in the lineage graph. The following request offers a way to request lineage for multiple nodes at once with a few limitations.
|
||
|
||
`lineage` query on `EntityWithRelationship` entities:
|
||
|
||
- A more direct reflection of the graph index
|
||
- 1-hop lineage only
|
||
- Multiple URNs
|
||
- Should not be requested too many times in a single request. 20 is a tested limit
|
||
|
||
The following examples are based on the sample lineage graph shown here:
|
||
|
||
<p align="center">
|
||
<img width="90%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/api/graphql/graphql-best-practices/sample_bulk_lineage.png"/>
|
||
</p>
|
||
|
||
Example Request:
|
||
|
||
```graphql
|
||
query getBulkEntityLineageV2(
|
||
$urns: [String!]! = [
|
||
"urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)"
|
||
"urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)"
|
||
]
|
||
) {
|
||
entities(urns: $urns) {
|
||
urn
|
||
type
|
||
... on DataJob {
|
||
jobId
|
||
dataFlow {
|
||
flowId
|
||
}
|
||
properties {
|
||
name
|
||
}
|
||
upstream: lineage(input: { direction: UPSTREAM, start: 0, count: 10 }) {
|
||
total
|
||
relationships {
|
||
type
|
||
entity {
|
||
urn
|
||
type
|
||
}
|
||
}
|
||
}
|
||
downstream: lineage(
|
||
input: { direction: DOWNSTREAM, start: 0, count: 10 }
|
||
) {
|
||
total
|
||
relationships {
|
||
type
|
||
entity {
|
||
urn
|
||
type
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"entities": [
|
||
{
|
||
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
|
||
"type": "DATA_JOB",
|
||
"jobId": "task_123",
|
||
"dataFlow": {
|
||
"flowId": "dag_abc"
|
||
},
|
||
"properties": {
|
||
"name": "User Creations"
|
||
},
|
||
"upstream": {
|
||
"total": 1,
|
||
"relationships": [
|
||
{
|
||
"type": "Consumes",
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
|
||
"type": "DATASET"
|
||
}
|
||
}
|
||
]
|
||
},
|
||
"downstream": {
|
||
"total": 2,
|
||
"relationships": [
|
||
{
|
||
"type": "DownstreamOf",
|
||
"entity": {
|
||
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
|
||
"type": "DATA_JOB"
|
||
}
|
||
},
|
||
{
|
||
"type": "Produces",
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
|
||
"type": "DATASET"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
},
|
||
{
|
||
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
|
||
"type": "DATA_JOB",
|
||
"jobId": "task_456",
|
||
"dataFlow": {
|
||
"flowId": "dag_abc"
|
||
},
|
||
"properties": {
|
||
"name": "User Deletions"
|
||
},
|
||
"upstream": {
|
||
"total": 2,
|
||
"relationships": [
|
||
{
|
||
"type": "DownstreamOf",
|
||
"entity": {
|
||
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
|
||
"type": "DATA_JOB"
|
||
}
|
||
},
|
||
{
|
||
"type": "Consumes",
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
|
||
"type": "DATASET"
|
||
}
|
||
}
|
||
]
|
||
},
|
||
"downstream": {
|
||
"total": 1,
|
||
"relationships": [
|
||
{
|
||
"type": "Produces",
|
||
"entity": {
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
|
||
"type": "DATASET"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
]
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|