datahub/docs/api/tutorials/modifying-dataset-descriptions.md
Hyejin Yoon e5d06733f2
feat(docs): consolidate api guides (#7857)
Co-authored-by: socar-dini <dini@socar.kr>
2023-04-20 12:17:11 +09:00

4.3 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Modifying Description

Why Would You Use Description on Dataset?

Adding a description and related link to a dataset can provide important information about the data, such as its source, collection methods, and potential uses. This can help others understand the context of the data and how it may be relevant to their own work or research. Including a related link can also provide access to additional resources or related datasets, further enriching the information available to users.

Goal Of This Guide

This guide will show you how to

  • Add dataset description: add a description and a link to dataset fct_users_deleted.
  • Add column description: add a description to user_name column of a dataset fct_users_deleted.

Prerequisites

For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed steps, please refer to Datahub Quickstart Guide.

:::note Before adding a description, you need to ensure the targeted dataset is already present in your datahub. If you attempt to manipulate entities that do not exist, your operation will fail. In this guide, we will be using data from sample ingestion. :::

In this example, we will add a description to user_name column of a dataset fct_users_deleted.

Add Description on Dataset

🚫 Adding Description on Dataset via graphql is currently not supported. Please check out API feature comparison table for more information,

{{ inline /metadata-ingestion/examples/library/dataset_add_documentation.py show_path_as_comment }}

Expected Outcomes of Adding Description on Dataset

You can now see the description is added to fct_users_deleted.

dataset-description-added

Add Description on Column

mutation updateDescription {
  updateDescription(
    input: {
      description: "Name of the user who was deleted. This description is updated via GrpahQL.",
      resourceUrn:"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
      subResource: "user_name",
      subResourceType:DATASET_FIELD
    }
  )
}

Note that you can use general markdown in description. For example, you can do the following.

mutation updateDescription {
  updateDescription(
    input: {
      description: """
      ### User Name
      The `user_name` column is a primary key column that contains the name of the user who was deleted.
      """,
      resourceUrn:"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
      subResource: "user_name",
      subResourceType:DATASET_FIELD
    }
  )
}

updateDescription currently only supports Dataset Schema Fields, Containers. For more information about the updateDescription mutation, please refer to updateLineage.

If you see the following response, the operation was successful:

{
  "data": {
    "updateDescription": true
  },
  "extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation updateDescription { updateDescription ( input: { description: \"Name of the user who was deleted. This description is updated via GrpahQL.\", resourceUrn: \"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)\", subResource: \"user_name\", subResourceType:DATASET_FIELD }) }", "variables":{}}'

Expected Response:

{ "data": { "updateDescription": true }, "extensions": {} }
{{ inline /metadata-ingestion/examples/library/dataset_add_column_documentation.py show_path_as_comment }}

Expected Outcomes of Adding Description on Column

You can now see column description is added to user_name column of fct_users_deleted.

column-description-added