2022-09-20 10:17:44 -07:00
# Working with Metadata Entities
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
Learn how to find, retrieve & update entities comprising your Metadata Graph programmatically.
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
## Reading an Entity: Queries
2021-09-22 17:30:15 -07:00
DataHub provides the following GraphQL queries for retrieving entities in your Metadata Graph.
### Getting a Metadata Entity
To retrieve a Metadata Entity by primary key (urn), simply use the `<entityName>(urn: String!)` GraphQL Query.
2022-09-20 10:17:44 -07:00
For example, to retrieve a `dataset` entity, you can issue the following GraphQL Query:
2021-09-22 17:30:15 -07:00
*As GraphQL*
```graphql
{
2022-09-20 10:17:44 -07:00
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)") {
2021-09-22 17:30:15 -07:00
urn
2022-09-20 10:17:44 -07:00
properties {
name
2021-09-22 17:30:15 -07:00
}
}
}
```
2022-09-20 10:17:44 -07:00
*As CURL*
2021-09-22 17:30:15 -07:00
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
2022-09-20 10:17:44 -07:00
--header 'Authorization: Bearer < my-access-token > ' \
2021-09-22 17:30:15 -07:00
--header 'Content-Type: application/json' \
2022-09-20 10:17:44 -07:00
--data-raw '{ "query":"{ dataset(urn: \"urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)\") { urn properties { name } } }", "variables":{}}'
2021-09-22 17:30:15 -07:00
```
2022-09-20 10:17:44 -07:00
In the following examples, we'll look at how to fetch specific types of metadata for an asset.
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
#### Querying for Owners of an entity
2022-01-13 00:44:31 +05:30
As GraphQL:
```graphql
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
ownership {
owners {
owner {
... on CorpUser {
urn
type
}
... on CorpGroup {
urn
type
}
}
}
}
}
}
```
2022-09-20 10:17:44 -07:00
#### Querying for Tags of an asset
2022-01-13 00:44:31 +05:30
As GraphQL:
```graphql
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
tags {
tags {
tag {
name
}
}
}
}
}
```
2022-09-20 10:17:44 -07:00
#### Querying for Domain of an asset
As GraphQL:
```graphql
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
domain {
domain {
urn
}
}
}
}
```
#### Querying for Glossary Terms of an asset
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
As GraphQL:
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
```graphql
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
glossaryTerms {
terms {
term {
urn
}
}
}
}
}
```
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
#### Querying for Deprecation of an asset
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
As GraphQL:
```graphql
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)") {
deprecation {
deprecated
decommissionTime
}
}
}
```
#### Relevant Queries
- [dataset ](../../../graphql/queries.md#dataset )
- [container ](../../../graphql/queries.md#container )
- [dashboard ](../../../graphql/queries.md#dashboard )
- [chart ](../../../graphql/queries.md#chart )
- [dataFlow ](../../../graphql/queries.md#dataflow )
- [dataJob ](../../../graphql/queries.md#datajob )
- [domain ](../../../graphql/queries.md#domain )
- [glossaryTerm ](../../../graphql/queries.md#glossaryterm )
- [glossaryNode ](../../../graphql/queries.md#glossarynode )
- [tag ](../../../graphql/queries.md#tag )
- [notebook ](../../../graphql/queries.md#notebook )
- [corpUser ](../../../graphql/queries.md#corpuser )
- [corpGroup ](../../../graphql/queries.md#corpgroup )
### Searching for a Metadata Entity
To perform full-text search against an Entity of a particular type, use the `search(input: SearchInput!)` GraphQL Query.
As GraphQL:
```graphql
{
search(input: { type: DATASET, query: "my sql dataset", start: 0, count: 10 }) {
start
count
total
searchResults {
entity {
urn
type
...on Dataset {
name
}
}
}
}
}
```
As CURL:
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query":"{ search(input: { type: DATASET, query: \"my sql dataset\", start: 0, count: 10 }) { start count total searchResults { entity { urn type ...on Dataset { name } } } } }", "variables":{}}'
```
> **Note** that by default Elasticsearch only allows pagination through 10,000 entities via the search API.
> If you need to paginate through more, you can change the default value for the `index.max_result_window` setting in Elasticsearch,
> or using the [scroll API](https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html) to read from the index directly.
#### Relevant Queries
- [search ](../../../graphql/queries.md#search )
- [searchAcrossEntities ](../../../graphql/queries.md#searchacrossentities )
- [searchAcrossLineage ](../../../graphql/queries.md#searchacrosslineage )
- [browse ](../../../graphql/queries.md#browse )
- [browsePaths ](../../../graphql/queries.md#browsepaths )
## Modifying an Entity: Mutations
2021-09-22 17:30:15 -07:00
### Authorization
2022-07-01 20:35:55 +01:00
Mutations which change Entity metadata are subject to [DataHub Access Policies ](../../authorization/policies.md ). This means that DataHub's server
2022-09-20 10:17:44 -07:00
will check whether the requesting actor is authorized to perform the action.
2021-09-22 17:30:15 -07:00
### Updating a Metadata Entity
To update an existing Metadata Entity, simply use the `update<entityName>(urn: String!, input: EntityUpdateInput!)` GraphQL Query.
For example, to update a Dashboard entity, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation updateDashboard {
updateDashboard(
urn: "urn:li:dashboard:(looker,baz)",
input: {
editableProperties: {
description: "My new desription"
}
}
) {
urn
}
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
2022-09-20 10:17:44 -07:00
--header 'Authorization: Bearer < my-access-token > ' \
2021-09-22 17:30:15 -07:00
--header 'Content-Type: application/json' \
2021-11-09 18:04:43 -08:00
--data-raw '{ "query": "mutation updateDashboard { updateDashboard(urn:\"urn:li:dashboard:(looker,baz)\", input: { editableProperties: { description: \"My new desription\" } } ) { urn } }", "variables":{}}'
2021-09-22 17:30:15 -07:00
```
**Be careful**: these APIs allow you to make significant changes to a Metadata Entity, often including
updating the entire set of Owners & Tags.
2022-09-20 10:17:44 -07:00
#### Relevant Mutations
- [updateDataset ](../../../graphql/mutations.md#updatedataset )
- [updateChart ](../../../graphql/mutations.md#updatechart )
- [updateDashboard ](../../../graphql/mutations.md#updatedashboard )
- [updateDataFlow ](../../../graphql/mutations.md#updatedataFlow )
- [updateDataJob ](../../../graphql/mutations.md#updatedataJob )
- [updateNotebook ](../../../graphql/mutations.md#updatenotebook )
2021-09-22 17:30:15 -07:00
2022-09-20 10:17:44 -07:00
### Adding & Removing Tags
To attach Tags to a Metadata Entity, you can use the `addTags` or `batchAddTags` mutations.
To remove them, you can use the `removeTag` or `batchRemoveTags` mutations.
For example, to add a Tag a Pipeline entity, you can issue the following GraphQL mutation:
2021-09-22 17:30:15 -07:00
*As GraphQL*
```graphql
2022-09-20 10:17:44 -07:00
mutation addTags {
addTags(input: { tagUrns: ["urn:li:tag:NewTag"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
2021-09-22 17:30:15 -07:00
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
2022-09-20 10:17:44 -07:00
--header 'Authorization: Bearer < my-access-token > ' \
2021-09-22 17:30:15 -07:00
--header 'Content-Type: application/json' \
2022-09-20 10:17:44 -07:00
--data-raw '{ "query": "mutation addTags { addTags(input: { tagUrns: [\"urn:li:tag:NewTag\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'
2021-09-22 17:30:15 -07:00
```
2022-09-20 10:17:44 -07:00
> **Pro-Tip**! You can also add or remove Tags from Dataset Schema Fields (or *Columns*) by
> providing 2 additional fields in your Query input:
>
> - subResourceType
> - subResource
>
> Where `subResourceType` is set to `DATASET_FIELD` and `subResource` is the field path of the column
> to change.
#### Relevant Mutations
- [addTags ](../../../graphql/mutations.md#addtags )
- [batchAddTags ](../../../graphql/mutations.md#batchaddtags )
- [removeTag ](../../../graphql/mutations.md#removetag )
- [batchRemoveTags ](../../../graphql/mutations.md#batchremovetags )
### Adding & Removing Glossary Terms
To attach Glossary Terms to a Metadata Entity, you can use the `addTerms` or `batchAddTerms` mutations.
To remove them, you can use the `removeTerm` or `batchRemoveTerms` mutations.
For example, to add a Glossary Term a Pipeline entity, you could issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation addTerms {
addTerms(input: { termUrns: ["urn:li:glossaryTerm:NewTerm"], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addTerms { addTerms(input: { termUrns: [\"urn:li:glossaryTerm:NewTerm\"], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'
```
> **Pro-Tip**! You can also add or remove Glossary Terms from Dataset Schema Fields (or *Columns*) by
> providing 2 additional fields in your Query input:
>
> - subResourceType
> - subResource
>
> Where `subResourceType` is set to `DATASET_FIELD` and `subResource` is the field path of the column
> to change.
#### Relevant Mutations
- [addTerms ](../../../graphql/mutations.md#addterms )
- [batchAddTerms ](../../../graphql/mutations.md#batchaddterms )
- [removeTerm ](../../../graphql/mutations.md#removeterm )
- [batchRemoveTerms ](../../../graphql/mutations.md#batchremoveterms )
### Adding & Removing Domain
To add an entity to a Domain, you can use the `setDomain` and `batchSetDomain` mutations.
To remove entities from a Domain, you can use the `unsetDomain` mutation or the `batchSetDomain` mutation.
For example, to add a Pipeline entity to the "Marketing" Domain, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation setDomain {
setDomain(domainUrn: "urn:li:domain:Marketing", entityUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)")
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation setDomain { setDomain(domainUrn: \"urn:li:domain:Marketing\", entityUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\") }", "variables":{}}'
```
#### Relevant Mutations
- [setDomain ](../../../graphql/mutations.md#setdomain )
- [batchSetDomain ](../../../graphql/mutations.md#batchsetdomain )
- [unsetDomain ](../../../graphql/mutations.md#unsetdomain )
### Adding & Removing Owners
To attach Owners to a Metadata Entity, you can use the `addOwners` or `batchAddOwners` mutations.
To remove them, you can use the `removeOwner` or `batchRemoveOwners` mutations.
For example, to add an Owner a Pipeline entity, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation addOwners {
addOwners(input: { owners: [ { ownerUrn: "urn:li:corpuser:datahub", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addOwners { addOwners(input: { owners: [ { ownerUrn: \"urn:li:corpuser:datahub\", ownerEntityType: CORP_USER, type: TECHNICAL_OWNER } ], resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'
```
#### Relevant Mutations
- [addOwners ](../../../graphql/mutations.md#addowners )
- [batchAddOwners ](../../../graphql/mutations.md#batchaddowners )
- [removeOwner ](../../../graphql/mutations.md#removeowner )
- [batchRemoveOwners ](../../../graphql/mutations.md#batchremoveowners )
### Updating Deprecation
To update deprecation for a Metadata Entity, you can use the `updateDeprecation` or `batchUpdateDeprecation` mutations.
For example, to mark a Pipeline entity as deprecated, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation updateDeprecation {
updateDeprecation(input: { urn: "urn:li:dataFlow:(airflow,dag_abc,PROD)", deprecated: true })
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDeprecation { updateDeprecation(input: { urn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\", deprecated: true }) }", "variables":{}}'
```
> **Note** that deprecation is NOT currently supported for assets of type `container`.
#### Relevant Mutations
- [updateDeprecation ](../../../graphql/mutations.md#updatedeprecation )
- [batchUpdateDeprecation ](../../../graphql/mutations.md#batchupdatedeprecation )
### Editing Description (i.e. Documentation)
> Notice that this API is currently evolving and in an experimental state. It supports the following entities today:
> - dataset
> - container
> - domain
> - glossary term
> - glossary node
> - tag
> - group
> - notebook
> - all ML entities
To edit the documentation for an entity, you can use the `updateDescription` mutation.
For example, to edit the documentation for a Pipeline, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation updateDescription {
updateDescription(input: { description: "The new description!", resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
}
```
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDescription { updateDescription(input: { description: \"The new description!\", resourceUrn: \"urn:li:dataFlow:(airflow,dag_abc,PROD)\" }) }", "variables":{}}'
```
> **Pro-Tip**! You can also edit Documentation for Dataset Schema Fields (or *Columns*) by
> providing 2 additional fields in your Query input:
>
> - subResourceType
> - subResource
>
> Where `subResourceType` is set to `DATASET_FIELD` and `subResource` is the field path of the column
> to change.
>
#### Relevant Mutations
- [updateDescription ](../../../graphql/mutations.md#updatedescription )
### Soft Deleting
DataHub allows you to soft-delete entities. This will effectively hide them from the search,
browse, and lineage experiences.
To mark an entity as soft-deleted, you can use the `batchUpdateSoftDeleted` mutation.
For example, to mark a Pipeline as soft deleted, you can issue the following GraphQL mutation:
*As GraphQL*
```graphql
mutation batchUpdateSoftDeleted {
batchUpdateSoftDeleted(input: { : urns: ["urn:li:dataFlow:(airflow,dag_abc,PROD)"], deleted: true })
}
```
Similarly, you can "un delete" an entity by setting deleted to 'false'.
*As CURL*
```curl
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer < my-access-token > ' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation batchUpdateSoftDeleted { batchUpdateSoftDeleted(input: { deleted: true, urns: [\"urn:li:dataFlow:(airflow,dag_abc,PROD)\"] }) }", "variables":{}}'
```
#### Relevant Mutations
- [batchUpdateSoftDeleted ](../../../graphql/mutations.md#batchupdatesoftdeleted )
2021-09-22 17:30:15 -07:00
## Handling Errors
In GraphQL, requests that have errors do not always result in a non-200 HTTP response body. Instead, errors will be
present in the response body inside a top-level `errors` field.
This enables situations in which the client is able to deal gracefully will partial data returned by the application server.
To verify that no error has returned after making a GraphQL request, make sure you check *both* the `data` and `errors` fields that are returned.
2022-09-20 10:17:44 -07:00
To catch a GraphQL error, simply check the `errors` field side the GraphQL response. It will contain a message, a path, and a set of extensions
which contain a standard error code.
```json
{
"errors":[
{
"message":"Failed to change ownership for resource urn:li:dataFlow:(airflow,dag_abc,PROD). Expected a corp user urn.",
"locations":[
{
"line":1,
"column":22
}
],
"path":[
"addOwners"
],
"extensions":{
"code":400,
"type":"BAD_REQUEST",
"classification":"DataFetchingException"
}
}
]
}
```
With the following error codes officially supported:
| Code | Type | Description |
|------|--------------|------------------------------------------------------------------------------------------------|
| 400 | BAD_REQUEST | The query or mutation was malformed. |
| 403 | UNAUTHORIZED | The current actor is not authorized to perform the requested action. |
| 404 | NOT_FOUND | The resource is not found. |
| 500 | SERVER_ERROR | An internal error has occurred. Check your server logs or contact your DataHub administrator. |
2021-09-22 17:30:15 -07:00
## Feedback, Feature Requests, & Support
2022-09-20 10:17:44 -07:00
Visit our [Slack channel ](https://slack.datahubproject.io ) to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just
2022-02-17 01:08:29 +01:00
stop by to say 'Hi'.
2022-09-20 10:17:44 -07:00