# Working with Metadata Entities Learn how to find, retrieve & update entities comprising your Metadata Graph programmatically. ## Reading an Entity: Queries DataHub provides the following GraphQL queries for retrieving entities in your Metadata Graph. ### Getting a Metadata Entity To retrieve a Metadata Entity by primary key (urn), simply use the `(urn: String!)` GraphQL Query. For example, to retrieve a `dataset` entity, you can issue the following GraphQL Query: *As GraphQL* ```graphql { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)") { urn properties { name } } } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query":"{ dataset(urn: \"urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)\") { urn properties { name } } }", "variables":{}}' ``` In the following examples, we'll look at how to fetch specific types of metadata for an asset. #### Querying for Owners of an entity As GraphQL: ```graphql query { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") { ownership { owners { owner { ... on CorpUser { urn type } ... on CorpGroup { urn type } } } } } } ``` #### Querying for Tags of an asset As GraphQL: ```graphql query { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") { tags { tags { tag { name } } } } } ``` #### Querying for Domain of an asset As GraphQL: ```graphql query { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") { domain { domain { urn } } } } ``` #### Querying for Glossary Terms of an asset As GraphQL: ```graphql query { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") { glossaryTerms { terms { term { urn } } } } } ``` #### Querying for Deprecation of an asset As GraphQL: ```graphql query { dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") { deprecation { deprecated decommissionTime } } } ``` #### Relevant Queries - [dataset](../../../graphql/queries.md#dataset) - [container](../../../graphql/queries.md#container) - [dashboard](../../../graphql/queries.md#dashboard) - [chart](../../../graphql/queries.md#chart) - [dataFlow](../../../graphql/queries.md#dataflow) - [dataJob](../../../graphql/queries.md#datajob) - [domain](../../../graphql/queries.md#domain) - [glossaryTerm](../../../graphql/queries.md#glossaryterm) - [glossaryNode](../../../graphql/queries.md#glossarynode) - [tag](../../../graphql/queries.md#tag) - [notebook](../../../graphql/queries.md#notebook) - [corpUser](../../../graphql/queries.md#corpuser) - [corpGroup](../../../graphql/queries.md#corpgroup) ### Searching for a Metadata Entity To perform full-text search against an Entity of a particular type, use the `search(input: SearchInput!)` GraphQL Query. As GraphQL: ```graphql { search(input: { type: DATASET, query: "my sql dataset", start: 0, count: 10 }) { start count total searchResults { entity { urn type ...on Dataset { name } } } } } ``` As CURL: ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query":"{ search(input: { type: DATASET, query: \"my sql dataset\", start: 0, count: 10 }) { start count total searchResults { entity { urn type ...on Dataset { name } } } } }", "variables":{}}' ``` > **Note** that by default Elasticsearch only allows pagination through 10,000 entities via the search API. > If you need to paginate through more, you can change the default value for the `index.max_result_window` setting in Elasticsearch, > or using the [scroll API](https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html) to read from the index directly. #### Relevant Queries - [search](../../../graphql/queries.md#search) - [searchAcrossEntities](../../../graphql/queries.md#searchacrossentities) - [searchAcrossLineage](../../../graphql/queries.md#searchacrosslineage) - [browse](../../../graphql/queries.md#browse) - [browsePaths](../../../graphql/queries.md#browsepaths) ## Modifying an Entity: Mutations ### Authorization Mutations which change Entity metadata are subject to [DataHub Access Policies](../../authorization/policies.md). This means that DataHub's server will check whether the requesting actor is authorized to perform the action. ### Updating a Metadata Entity To update an existing Metadata Entity, simply use the `update(urn: String!, input: EntityUpdateInput!)` GraphQL Query. For example, to update a Dashboard entity, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation updateDashboard { updateDashboard( urn: "urn:li:dashboard:(looker,baz)", input: { editableProperties: { description: "My new desription" } } ) { urn } } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation updateDashboard { updateDashboard(urn:\"urn:li:dashboard:(looker,baz)\", input: { editableProperties: { description: \"My new desription\" } } ) { urn } }", "variables":{}}' ``` **Be careful**: these APIs allow you to make significant changes to a Metadata Entity, often including updating the entire set of Owners & Tags. #### Relevant Mutations - [updateDataset](../../../graphql/mutations.md#updatedataset) - [updateChart](../../../graphql/mutations.md#updatechart) - [updateDashboard](../../../graphql/mutations.md#updatedashboard) - [updateDataFlow](../../../graphql/mutations.md#updatedataFlow) - [updateDataJob](../../../graphql/mutations.md#updatedataJob) - [updateNotebook](../../../graphql/mutations.md#updatenotebook) ### Adding & Removing Tags To attach Tags to a Metadata Entity, you can use the `addTags` or `batchAddTags` mutations. To remove them, you can use the `removeTag` or `batchRemoveTags` mutations. For example, to add a Tag a Pipeline entity, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation addTags { addTags(input: { tagUrns: ["urn:li:tag:NewTag"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" }) } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation addTags { addTags(input: { tagUrns: [\"urn:li:tag:NewTag\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}' ``` > **Pro-Tip**! You can also add or remove Tags from Dataset Schema Fields (or *Columns*) by > providing 2 additional fields in your Query input: > > - subResourceType > - subResource > > Where `subResourceType` is set to `DATASET_FIELD` and `subResource` is the field path of the column > to change. #### Relevant Mutations - [addTags](../../../graphql/mutations.md#addtags) - [batchAddTags](../../../graphql/mutations.md#batchaddtags) - [removeTag](../../../graphql/mutations.md#removetag) - [batchRemoveTags](../../../graphql/mutations.md#batchremovetags) ### Adding & Removing Glossary Terms To attach Glossary Terms to a Metadata Entity, you can use the `addTerms` or `batchAddTerms` mutations. To remove them, you can use the `removeTerm` or `batchRemoveTerms` mutations. For example, to add a Glossary Term a Pipeline entity, you could issue the following GraphQL mutation: *As GraphQL* ```graphql mutation addTerms { addTerms(input: { termUrns: ["urn:li:glossaryTerm:NewTerm"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" }) } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation addTerms { addTerms(input: { termUrns: [\"urn:li:glossaryTerm:NewTerm\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}' ``` > **Pro-Tip**! You can also add or remove Glossary Terms from Dataset Schema Fields (or *Columns*) by > providing 2 additional fields in your Query input: > > - subResourceType > - subResource > > Where `subResourceType` is set to `DATASET_FIELD` and `subResource` is the field path of the column > to change. #### Relevant Mutations - [addTerms](../../../graphql/mutations.md#addterms) - [batchAddTerms](../../../graphql/mutations.md#batchaddterms) - [removeTerm](../../../graphql/mutations.md#removeterm) - [batchRemoveTerms](../../../graphql/mutations.md#batchremoveterms) ### Adding & Removing Domain To add an entity to a Domain, you can use the `setDomain` and `batchSetDomain` mutations. To remove entities from a Domain, you can use the `unsetDomain` mutation or the `batchSetDomain` mutation. For example, to add a Pipeline entity to the "Marketing" Domain, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation setDomain { setDomain(domainUrn: "urn:li:domain:Marketing", entityUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)") } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation setDomain { setDomain(domainUrn: \"urn:li:domain:Marketing\", entityUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\") }", "variables":{}}' ``` #### Relevant Mutations - [setDomain](../../../graphql/mutations.md#setdomain) - [batchSetDomain](../../../graphql/mutations.md#batchsetdomain) - [unsetDomain](../../../graphql/mutations.md#unsetdomain) ### Adding & Removing Owners To attach Owners to a Metadata Entity, you can use the `addOwners` or `batchAddOwners` mutations. To remove them, you can use the `removeOwner` or `batchRemoveOwners` mutations. For example, to add an Owner a Pipeline entity, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation addOwners { addOwners(input: { owners: [ { ownerUrn: "urn:li:corpuser:datahub", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" }) } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation addOwners { addOwners(input: { owners: [ { ownerUrn: \"urn:li:corpuser:datahub\", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}' ``` #### Relevant Mutations - [addOwners](../../../graphql/mutations.md#addowners) - [batchAddOwners](../../../graphql/mutations.md#batchaddowners) - [removeOwner](../../../graphql/mutations.md#removeowner) - [batchRemoveOwners](../../../graphql/mutations.md#batchremoveowners) ### Updating Deprecation To update deprecation for a Metadata Entity, you can use the `updateDeprecation` or `batchUpdateDeprecation` mutations. For example, to mark a Pipeline entity as deprecated, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation updateDeprecation { updateDeprecation(input: { urn: "urn:li:dataFlow:(airflow,dag_abc,PROD)", deprecated: true }) } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation updateDeprecation { updateDeprecation(input: { urn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\", deprecated: true }) }", "variables":{}}' ``` > **Note** that deprecation is NOT currently supported for assets of type `container`. #### Relevant Mutations - [updateDeprecation](../../../graphql/mutations.md#updatedeprecation) - [batchUpdateDeprecation](../../../graphql/mutations.md#batchupdatedeprecation) ### Editing Description (i.e. Documentation) > Notice that this API is currently evolving and in an experimental state. It supports the following entities today: > - dataset > - container > - domain > - glossary term > - glossary node > - tag > - group > - notebook > - all ML entities To edit the documentation for an entity, you can use the `updateDescription` mutation. `updateDescription` currently supports Dataset Schema Fields, Containers. For example, to edit the documentation for a Pipeline, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation updateDescription { updateDescription( input: { description: "Name of the user who was deleted. This description is updated via GrpahQL.", resourceUrn:"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)", subResource: "user_name", subResourceType:DATASET_FIELD } ) } ``` *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation updateDescription { updateDescription ( input: { description: \"Name of the user who was deleted. This description is updated via GrpahQL.\", resourceUrn: \"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)\", subResource: \"user_name\", subResourceType:DATASET_FIELD }) }", "variables":{}}' ``` #### Relevant Mutations - [updateDescription](../../../graphql/mutations.md#updatedescription) ### Soft Deleting DataHub allows you to soft-delete entities. This will effectively hide them from the search, browse, and lineage experiences. To mark an entity as soft-deleted, you can use the `batchUpdateSoftDeleted` mutation. For example, to mark a Pipeline as soft deleted, you can issue the following GraphQL mutation: *As GraphQL* ```graphql mutation batchUpdateSoftDeleted { batchUpdateSoftDeleted(input: { : urns: ["urn:li:dataFlow:(airflow,dag_abc,PROD)"], deleted: true }) } ``` Similarly, you can "un delete" an entity by setting deleted to 'false'. *As CURL* ```curl curl --location --request POST 'http://localhost:8080/api/graphql' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "mutation batchUpdateSoftDeleted { batchUpdateSoftDeleted(input: { deleted: true, urns: [\"urn:li:dataFlow:(airflow,dag_abc,PROD)\"] }) }", "variables":{}}' ``` #### Relevant Mutations - [batchUpdateSoftDeleted](../../../graphql/mutations.md#batchupdatesoftdeleted) ## Handling Errors In GraphQL, requests that have errors do not always result in a non-200 HTTP response body. Instead, errors will be present in the response body inside a top-level `errors` field. This enables situations in which the client is able to deal gracefully will partial data returned by the application server. To verify that no error has returned after making a GraphQL request, make sure you check *both* the `data` and `errors` fields that are returned. To catch a GraphQL error, simply check the `errors` field side the GraphQL response. It will contain a message, a path, and a set of extensions which contain a standard error code. ```json { "errors":[ { "message":"Failed to change ownership for resource urn:li:dataFlow:(airflow,dag_abc,PROD). Expected a corp user urn.", "locations":[ { "line":1, "column":22 } ], "path":[ "addOwners" ], "extensions":{ "code":400, "type":"BAD_REQUEST", "classification":"DataFetchingException" } } ] } ``` With the following error codes officially supported: | Code | Type | Description | |------|--------------|------------------------------------------------------------------------------------------------| | 400 | BAD_REQUEST | The query or mutation was malformed. | | 403 | UNAUTHORIZED | The current actor is not authorized to perform the requested action. | | 404 | NOT_FOUND | The resource is not found. | | 500 | SERVER_ERROR | An internal error has occurred. Check your server logs or contact your DataHub administrator. | ## Feedback, Feature Requests, & Support Visit our [Slack channel](https://slack.datahubproject.io) to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just stop by to say 'Hi'.