mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-30 18:26:58 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			1077 lines
		
	
	
		
			26 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			1077 lines
		
	
	
		
			26 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| import Tabs from '@theme/Tabs';
 | ||
| import TabItem from '@theme/TabItem';
 | ||
| 
 | ||
| # GraphQL Best Practices
 | ||
| 
 | ||
| ## Introduction:
 | ||
| 
 | ||
| DataHub’s GraphQL API is designed to power the UI. The following guidelines are written with this use-case in mind.
 | ||
| 
 | ||
| ## General Best Practices
 | ||
| 
 | ||
| ### Query Optimizations
 | ||
| 
 | ||
| > One of GraphQL's biggest advantages over a traditional REST API is its support for **declarative data fetching**. Each component can (and should) query exactly the fields it requires to render, with no superfluous data sent over the network. If instead your root component executes a single, enormous query to obtain data for all of its children, it might query on behalf of components that _aren't even rendered_ given the current state. This can result in a delayed response, and it drastically reduces the likelihood that the query's result can be reused by a **server-side response cache**. [[ref](https://www.apolloGraphQL.com/docs/react/data/operation-best-practices#query-only-the-data-you-need-where-you-need-it)]
 | ||
| 
 | ||
| 1. Minimize over-fetching by only requesting data needed to be displayed.
 | ||
| 2. Limit result counts and use pagination (additionally see section below on `Deep Pagination`).
 | ||
| 3. Avoid deeply nested queries and instead break out queries into separate requests for the nested objects.
 | ||
| 
 | ||
| ### Client-side Caching
 | ||
| 
 | ||
| Clients, such as Apollo Client (javascript, python `apollo-client-python`), offer [client-side caching](https://www.apolloGraphQL.com/docs/react/caching/overview) to prevent requests to the service and are able to understand the content of the GraphQL query. This enables more advanced caching vs HTTP response caching.
 | ||
| 
 | ||
| ### Reuse Pieces of Query Logic with Fragments
 | ||
| 
 | ||
| One powerful feature of GraphQL that we recommend you use is [fragments](https://hygraph.com/learn/GraphQL/fragments). Fragments allow you to define pieces of a query that you can reuse across any client-side query that you define. Basically, you can define a set of fields that you want to query, and reuse it in multiple places.
 | ||
| 
 | ||
| This technique makes maintaining your GraphQL queries much more doable. For example, if you want to request a new field for an entity type across many queries, you’re able to update it in one place if you’re leveraging fragments.
 | ||
| 
 | ||
| ## Search Query Best Practices
 | ||
| 
 | ||
| ### Deep Pagination: search* vs scroll* APIs
 | ||
| 
 | ||
| `search*` APIs such as [`searchAcrossEntities`](https://docs.datahub.com/docs/GraphQL/queries/#searchacrossentities) are designed for minimal pagination (< ~50). They do not perform well for deep pagination requests. Use the equivalent `scroll*` APIs such as [`scrollAcrossEntities`](https://docs.datahub.com/docs/GraphQL/queries/#scrollacrossentities) when expecting the need to paginate deeply into the result set.
 | ||
| 
 | ||
| :::note
 | ||
| It is impossible to use `search*` for paginating beyond 10k results.
 | ||
| :::
 | ||
| 
 | ||
| :::caution
 | ||
| In order to `scroll*` through the entire result set it is required to use a stable sort order. This means using `_score` as
 | ||
| the first sort order cannot be used. Use the `urn` field as the sort order instead.
 | ||
| :::
 | ||
| 
 | ||
| #### Examples
 | ||
| 
 | ||
| In the following examples we demonstrate pagination for both `scroll*` and `search*` requests. This particular request is searching for two entities, Datasets and Charts, that
 | ||
| contain `pet` in the entities' name or title. The results will only include the URN for the entities.
 | ||
| 
 | ||
| <Tab>
 | ||
| <TabItem value="Scroll" label="Scroll" default>
 | ||
| Page 1 Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   scrollAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET, CHART]
 | ||
|       count: 2
 | ||
|       query: "*"
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
 | ||
|         { and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
 | ||
|       ]
 | ||
|       sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
 | ||
|     }
 | ||
|   ) {
 | ||
|     nextScrollId
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|         ... on Chart {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 1 Result:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "scrollAcrossEntities": {
 | ||
|       "nextScrollId": "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0=",
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           }
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
 | ||
|           }
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 2 Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   scrollAcrossEntities(
 | ||
|     input: {
 | ||
|       scrollId: "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0="
 | ||
|       types: [DATASET, CHART]
 | ||
|       count: 2
 | ||
|       query: "*"
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
 | ||
|         { and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
 | ||
|       ]
 | ||
|       sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
 | ||
|     }
 | ||
|   ) {
 | ||
|     nextScrollId
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|         ... on Chart {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 2 Result:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "scrollAcrossEntities": {
 | ||
|       "nextScrollId": "eyJzb3J0IjpbMS43MTg3NSwidXJuOmxpOmRhdGFzZXQ6KHVybjpsaTpkYXRhUGxhdGZvcm06c25vd2ZsYWtlLGxvbmdfdGFpbF9jb21wYW5pb25zLmFkb3B0aW9uLnBldHMsUFJPRCkiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ==",
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
 | ||
|           }
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
 | ||
|           }
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| 
 | ||
| <TabItem value="Search" label="Search">
 | ||
| Page 1 Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET, CHART]
 | ||
|       count: 2
 | ||
|       start: 0
 | ||
|       query: "*"
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
 | ||
|         { and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
 | ||
|       ]
 | ||
|     }
 | ||
|   ) {
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|         ... on Chart {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 1 Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossEntities": {
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           }
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
 | ||
|           }
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 2 Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET, CHART]
 | ||
|       count: 2
 | ||
|       start: 2
 | ||
|       query: "*"
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] }
 | ||
|         { and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
 | ||
|       ]
 | ||
|     }
 | ||
|   ) {
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|         ... on Chart {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Page 2 Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossEntities": {
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
 | ||
|           }
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
 | ||
|           }
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| </Tab>
 | ||
| 
 | ||
| ### SearchFlags: Highlighting and Aggregation
 | ||
| 
 | ||
| When performing queries which accept [`searchFlags`](https://docs.datahub.com/docs/GraphQL/inputObjects#searchflags) and highlighting and aggregation is not needed, be sure to disable these flags.
 | ||
| 
 | ||
| - skipHighlighting: true
 | ||
| - skipAggregates: true
 | ||
| 
 | ||
| As a fallback, if only certain fields require highlighting use `customHighlightingFields` to limit highlighting to the specific fields.
 | ||
| 
 | ||
| <Tab>
 | ||
| <TabItem value="Skip" label="Skip Example" default>
 | ||
| 
 | ||
| Example for skipping highlighting and aggregates, typically used for scrolling search requests.
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   scrollAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET]
 | ||
|       count: 2
 | ||
|       query: "pet"
 | ||
|       searchFlags: { skipAggregates: true, skipHighlighting: true }
 | ||
|       sortInput: { sortCriteria: [{ field: "urn", sortOrder: ASCENDING }] }
 | ||
|     }
 | ||
|   ) {
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|       matchedFields {
 | ||
|         name
 | ||
|         value
 | ||
|       }
 | ||
|     }
 | ||
|     facets {
 | ||
|       displayName
 | ||
|       aggregations {
 | ||
|         value
 | ||
|         count
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Response:
 | ||
| 
 | ||
| Note that a few `matchedFields` are still returned by default [`urn`, `customProperties`]
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "scrollAcrossEntities": {
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           },
 | ||
|           "matchedFields": [
 | ||
|             {
 | ||
|               "name": "urn",
 | ||
|               "value": ""
 | ||
|             },
 | ||
|             {
 | ||
|               "name": "customProperties",
 | ||
|               "value": ""
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           },
 | ||
|           "matchedFields": [
 | ||
|             {
 | ||
|               "name": "urn",
 | ||
|               "value": ""
 | ||
|             },
 | ||
|             {
 | ||
|               "name": "customProperties",
 | ||
|               "value": ""
 | ||
|             }
 | ||
|           ]
 | ||
|         }
 | ||
|       ],
 | ||
|       "facets": []
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| 
 | ||
| <TabItem value="Custom" label="Custom Highlighting">
 | ||
| 
 | ||
| Custom highlighting can be used for searchAcrossEntities when only a limited number of fields are useful for highlighting. In this example we specifically request highlighting for `description`.
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET]
 | ||
|       count: 2
 | ||
|       query: "pet"
 | ||
|       searchFlags: { customHighlightingFields: ["description"] }
 | ||
|     }
 | ||
|   ) {
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|       matchedFields {
 | ||
|         name
 | ||
|         value
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossEntities": {
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           },
 | ||
|           "matchedFields": [
 | ||
|             {
 | ||
|               "name": "urn",
 | ||
|               "value": ""
 | ||
|             },
 | ||
|             {
 | ||
|               "name": "customProperties",
 | ||
|               "value": ""
 | ||
|             },
 | ||
|             {
 | ||
|               "name": "description",
 | ||
|               "value": "Table with all pet-related details"
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
 | ||
|           },
 | ||
|           "matchedFields": [
 | ||
|             {
 | ||
|               "name": "urn",
 | ||
|               "value": ""
 | ||
|             },
 | ||
|             {
 | ||
|               "name": "customProperties",
 | ||
|               "value": ""
 | ||
|             }
 | ||
|           ]
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| </Tab>
 | ||
| 
 | ||
| ### Aggregation
 | ||
| 
 | ||
| When aggregation is required with `searchAcrossEntities`, it is possible to set the `count` to 0 to avoid fetching the top search hits, only returning the aggregations. Alternatively [aggregateAcrossEntities](https://docs.datahub.com/docs/GraphQL/queries#aggregateacrossentities) provides counts and can provide faster results from server-side caching.
 | ||
| 
 | ||
| Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossEntities(
 | ||
|     input: {
 | ||
|       types: [DATASET]
 | ||
|       count: 0
 | ||
|       query: "pet"
 | ||
|       searchFlags: { skipHighlighting: true }
 | ||
|     }
 | ||
|   ) {
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         ... on Dataset {
 | ||
|           urn
 | ||
|         }
 | ||
|       }
 | ||
|       matchedFields {
 | ||
|         name
 | ||
|         value
 | ||
|       }
 | ||
|     }
 | ||
|     facets {
 | ||
|       displayName
 | ||
|       aggregations {
 | ||
|         value
 | ||
|         count
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossEntities": {
 | ||
|       "searchResults": [],
 | ||
|       "facets": [
 | ||
|         {
 | ||
|           "displayName": "Container",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:container:b41c14bc5cb3ccfbb0433c8cbdef2992",
 | ||
|               "count": 4
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:container:701919de0ec93cb338fe9bac0b35403c",
 | ||
|               "count": 3
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Sub Type",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "table",
 | ||
|               "count": 9
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "view",
 | ||
|               "count": 6
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "explore",
 | ||
|               "count": 5
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "source",
 | ||
|               "count": 4
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "incremental",
 | ||
|               "count": 1
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Type",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "DATASET",
 | ||
|               "count": 24
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Environment",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "PROD",
 | ||
|               "count": 24
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Glossary Term",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:glossaryTerm:Adoption.DaysInStatus",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:glossaryTerm:Ecommerce.HighRisk",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:glossaryTerm:Classification.Confidential",
 | ||
|               "count": 1
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Domain",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:domain:094dc54b-0ebc-40a6-a4cf-e1b75e8b8089",
 | ||
|               "count": 6
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:domain:7d64d0fa-66c3-445c-83db-3a324723daf8",
 | ||
|               "count": 2
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Owned By",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:corpGroup:Adoption",
 | ||
|               "count": 5
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:shannon@longtail.com",
 | ||
|               "count": 4
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:admin",
 | ||
|               "count": 2
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpGroup:Analytics Engineering",
 | ||
|               "count": 2
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:avigdor@longtail.com",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:prentiss@longtail.com",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:tasha@longtail.com",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:ricca@longtail.com",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:corpuser:emilee@longtail.com",
 | ||
|               "count": 1
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Platform",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:dataPlatform:looker",
 | ||
|               "count": 8
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:dataPlatform:dbt",
 | ||
|               "count": 7
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:dataPlatform:snowflake",
 | ||
|               "count": 7
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:dataPlatform:s3",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:dataPlatform:mongodb",
 | ||
|               "count": 1
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Tag",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "urn:li:tag:prod_model",
 | ||
|               "count": 3
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:tag:pii",
 | ||
|               "count": 2
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:tag:business critical",
 | ||
|               "count": 2
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:tag:business_critical",
 | ||
|               "count": 2
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:tag:Tier1",
 | ||
|               "count": 1
 | ||
|             },
 | ||
|             {
 | ||
|               "value": "urn:li:tag:prod",
 | ||
|               "count": 1
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         {
 | ||
|           "displayName": "Type",
 | ||
|           "aggregations": [
 | ||
|             {
 | ||
|               "value": "DATASET",
 | ||
|               "count": 24
 | ||
|             }
 | ||
|           ]
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ### Limit Search Entity Types
 | ||
| 
 | ||
| When querying for specific entities, enumerate only the entity types required using `types` , for example [`DATASET` , `CHART`]
 | ||
| 
 | ||
| ### Limit Results
 | ||
| 
 | ||
| Limit search results based on the amount of information being requested. For example, a minimal number of attributes can fetch 1,000 - 2,000 results in a single page, however as the number of attributes increases (especially nested objects) the `count` should be lowered, 20-25 for very complex requests.
 | ||
| 
 | ||
| ## Lineage Query Best Practices
 | ||
| 
 | ||
| There are two primary ways to query lineage:
 | ||
| 
 | ||
| ### Search Across Lineage
 | ||
| 
 | ||
| `searchAcrossLineage` / `scrollAcrossLineage` root query:
 | ||
| 
 | ||
| - Recommended for all lineage queries
 | ||
| - Only the shortest path is guaranteed to show up in `paths`
 | ||
| - Supports querying indirect lineage (depth > 1)
 | ||
|   - Depending on the fanout of the lineage, 3+ hops may not return data, use 1-hop queries for the fastest response times.
 | ||
|   - Specify using a filter with name `"degree"` and values `"1"` , `"2"`, and / or `"3+"`
 | ||
| 
 | ||
| The following examples are demonstrated using sample data for `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
 | ||
| 
 | ||
| <p align="center">
 | ||
|   <img width="90%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/api/graphql/graphql-best-practices/sample_lineage.png"/>
 | ||
| </p>
 | ||
| 
 | ||
| <Tab>
 | ||
| <TabItem value="Upstream1" label="1-Hop Upstream">
 | ||
| 
 | ||
| The following example queries show UPSTREAM lineage with progressively higher degrees, first with degree `["1"]` and then `["1","2"]`.
 | ||
| 
 | ||
| 1-Hop Upstreams:
 | ||
| 
 | ||
| Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossLineage(
 | ||
|     input: {
 | ||
|       urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
 | ||
|       query: "*"
 | ||
|       count: 10
 | ||
|       start: 0
 | ||
|       direction: UPSTREAM
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "degree", condition: EQUAL, values: ["1"] }] }
 | ||
|       ]
 | ||
|       searchFlags: { skipAggregates: true, skipHighlighting: true }
 | ||
|     }
 | ||
|   ) {
 | ||
|     start
 | ||
|     count
 | ||
|     total
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         urn
 | ||
|         type
 | ||
|         ... on Dataset {
 | ||
|           name
 | ||
|         }
 | ||
|       }
 | ||
|       paths {
 | ||
|         path {
 | ||
|           ... on Dataset {
 | ||
|             urn
 | ||
|           }
 | ||
|         }
 | ||
|       }
 | ||
|       degree
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossLineage": {
 | ||
|       "start": 0,
 | ||
|       "count": 10,
 | ||
|       "total": 1,
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
 | ||
|             "type": "DATASET",
 | ||
|             "name": "SampleHdfsDataset"
 | ||
|           },
 | ||
|           "paths": [
 | ||
|             {
 | ||
|               "path": [
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
 | ||
|                 },
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
 | ||
|                 }
 | ||
|               ]
 | ||
|             }
 | ||
|           ],
 | ||
|           "degree": 1
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| 
 | ||
| <TabItem value="Upstream2" label="2-Hop Upstream">
 | ||
| 1-Hop & 2-Hop Upstreams:
 | ||
| 
 | ||
| Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| {
 | ||
|   searchAcrossLineage(
 | ||
|     input: {
 | ||
|       urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
 | ||
|       query: "*"
 | ||
|       count: 10
 | ||
|       start: 0
 | ||
|       direction: UPSTREAM
 | ||
|       orFilters: [
 | ||
|         { and: [{ field: "degree", condition: EQUAL, values: ["1", "2"] }] }
 | ||
|       ]
 | ||
|       searchFlags: { skipAggregates: true, skipHighlighting: true }
 | ||
|     }
 | ||
|   ) {
 | ||
|     start
 | ||
|     count
 | ||
|     total
 | ||
|     searchResults {
 | ||
|       entity {
 | ||
|         urn
 | ||
|         type
 | ||
|         ... on Dataset {
 | ||
|           name
 | ||
|         }
 | ||
|       }
 | ||
|       paths {
 | ||
|         path {
 | ||
|           ... on Dataset {
 | ||
|             urn
 | ||
|           }
 | ||
|         }
 | ||
|       }
 | ||
|       degree
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "searchAcrossLineage": {
 | ||
|       "start": 0,
 | ||
|       "count": 10,
 | ||
|       "total": 2,
 | ||
|       "searchResults": [
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
 | ||
|             "type": "DATASET",
 | ||
|             "name": "SampleHdfsDataset"
 | ||
|           },
 | ||
|           "paths": [
 | ||
|             {
 | ||
|               "path": [
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
 | ||
|                 },
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
 | ||
|                 }
 | ||
|               ]
 | ||
|             }
 | ||
|           ],
 | ||
|           "degree": 1
 | ||
|         },
 | ||
|         {
 | ||
|           "entity": {
 | ||
|             "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
 | ||
|             "type": "DATASET",
 | ||
|             "name": "SampleKafkaDataset"
 | ||
|           },
 | ||
|           "paths": [
 | ||
|             {
 | ||
|               "path": [
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
 | ||
|                 },
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
 | ||
|                 },
 | ||
|                 {
 | ||
|                   "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
 | ||
|                 }
 | ||
|               ]
 | ||
|             }
 | ||
|           ],
 | ||
|           "degree": 2
 | ||
|         }
 | ||
|       ]
 | ||
|     }
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| </TabItem>
 | ||
| </Tab>
 | ||
| 
 | ||
| ### Lineage Subquery
 | ||
| 
 | ||
| The previous query requires a root or starting node in the lineage graph. The following request offers a way to request lineage for multiple nodes at once with a few limitations.
 | ||
| 
 | ||
| `lineage` query on `EntityWithRelationship` entities:
 | ||
| 
 | ||
| - A more direct reflection of the graph index
 | ||
| - 1-hop lineage only
 | ||
| - Multiple URNs
 | ||
| - Should not be requested too many times in a single request. 20 is a tested limit
 | ||
| 
 | ||
| The following examples are based on the sample lineage graph shown here:
 | ||
| 
 | ||
| <p align="center">
 | ||
|   <img width="90%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/api/graphql/graphql-best-practices/sample_bulk_lineage.png"/>
 | ||
| </p>
 | ||
| 
 | ||
| Example Request:
 | ||
| 
 | ||
| ```graphql
 | ||
| query getBulkEntityLineageV2(
 | ||
|   $urns: [String!]! = [
 | ||
|     "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)"
 | ||
|     "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)"
 | ||
|   ]
 | ||
| ) {
 | ||
|   entities(urns: $urns) {
 | ||
|     urn
 | ||
|     type
 | ||
|     ... on DataJob {
 | ||
|       jobId
 | ||
|       dataFlow {
 | ||
|         flowId
 | ||
|       }
 | ||
|       properties {
 | ||
|         name
 | ||
|       }
 | ||
|       upstream: lineage(input: { direction: UPSTREAM, start: 0, count: 10 }) {
 | ||
|         total
 | ||
|         relationships {
 | ||
|           type
 | ||
|           entity {
 | ||
|             urn
 | ||
|             type
 | ||
|           }
 | ||
|         }
 | ||
|       }
 | ||
|       downstream: lineage(
 | ||
|         input: { direction: DOWNSTREAM, start: 0, count: 10 }
 | ||
|       ) {
 | ||
|         total
 | ||
|         relationships {
 | ||
|           type
 | ||
|           entity {
 | ||
|             urn
 | ||
|             type
 | ||
|           }
 | ||
|         }
 | ||
|       }
 | ||
|     }
 | ||
|   }
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Example Response:
 | ||
| 
 | ||
| ```json
 | ||
| {
 | ||
|   "data": {
 | ||
|     "entities": [
 | ||
|       {
 | ||
|         "urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
 | ||
|         "type": "DATA_JOB",
 | ||
|         "jobId": "task_123",
 | ||
|         "dataFlow": {
 | ||
|           "flowId": "dag_abc"
 | ||
|         },
 | ||
|         "properties": {
 | ||
|           "name": "User Creations"
 | ||
|         },
 | ||
|         "upstream": {
 | ||
|           "total": 1,
 | ||
|           "relationships": [
 | ||
|             {
 | ||
|               "type": "Consumes",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
 | ||
|                 "type": "DATASET"
 | ||
|               }
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         "downstream": {
 | ||
|           "total": 2,
 | ||
|           "relationships": [
 | ||
|             {
 | ||
|               "type": "DownstreamOf",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
 | ||
|                 "type": "DATA_JOB"
 | ||
|               }
 | ||
|             },
 | ||
|             {
 | ||
|               "type": "Produces",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
 | ||
|                 "type": "DATASET"
 | ||
|               }
 | ||
|             }
 | ||
|           ]
 | ||
|         }
 | ||
|       },
 | ||
|       {
 | ||
|         "urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
 | ||
|         "type": "DATA_JOB",
 | ||
|         "jobId": "task_456",
 | ||
|         "dataFlow": {
 | ||
|           "flowId": "dag_abc"
 | ||
|         },
 | ||
|         "properties": {
 | ||
|           "name": "User Deletions"
 | ||
|         },
 | ||
|         "upstream": {
 | ||
|           "total": 2,
 | ||
|           "relationships": [
 | ||
|             {
 | ||
|               "type": "DownstreamOf",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
 | ||
|                 "type": "DATA_JOB"
 | ||
|               }
 | ||
|             },
 | ||
|             {
 | ||
|               "type": "Consumes",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
 | ||
|                 "type": "DATASET"
 | ||
|               }
 | ||
|             }
 | ||
|           ]
 | ||
|         },
 | ||
|         "downstream": {
 | ||
|           "total": 1,
 | ||
|           "relationships": [
 | ||
|             {
 | ||
|               "type": "Produces",
 | ||
|               "entity": {
 | ||
|                 "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
 | ||
|                 "type": "DATASET"
 | ||
|               }
 | ||
|             }
 | ||
|           ]
 | ||
|         }
 | ||
|       }
 | ||
|     ]
 | ||
|   },
 | ||
|   "extensions": {}
 | ||
| }
 | ||
| ```
 | 
