docs: add guides on forms & structured properties (#10340)

This commit is contained in:
Hyejin Yoon 2024-05-21 18:12:10 +09:00 committed by GitHub
parent 634a486d81
commit 7f37c6f17a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 990 additions and 333 deletions

View File

@ -106,6 +106,7 @@ module.exports = {
type: "doc",
id: "docs/features/dataset-usage-and-query-history",
},
"docs/features/feature-guides/documentation-forms",
{
label: "Domains",
type: "doc",
@ -162,6 +163,7 @@ module.exports = {
type: "doc",
id: "docs/posts",
},
"docs/features/feature-guides/properties",
{
label: "Schema history",
type: "doc",
@ -676,11 +678,6 @@ module.exports = {
label: "OpenAPI",
id: "docs/api/openapi/openapi-usage-guide",
},
{
type: "doc",
label: "Structured Properties",
id: "docs/api/openapi/openapi-structured-properties",
},
],
},
"docs/dev-guides/timeline",
@ -810,6 +807,8 @@ module.exports = {
"docs/api/tutorials/descriptions",
"docs/api/tutorials/custom-properties",
"docs/api/tutorials/ml",
"docs/api/tutorials/structured-properties",
"docs/api/tutorials/forms",
],
},
{

View File

@ -1,328 +0,0 @@
# Structured Properties - DataHub OpenAPI v2 Guide
This guides walks through the process of creating and using a Structured Property using the `v2` version
of the DataHub OpenAPI implementation. Note that this refers to DataHub's OpenAPI version and not the version of OpenAPI itself.
Requirements:
* curl
* jq
## Structured Property Definition
Before a structured property can be added to an entity it must first be defined. Here is an example
structured property being created against a local quickstart instance.
### Create Property Definition
Example Request:
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "my.test.MyProperty01",
"displayName": "MyProperty01",
"valueType": "urn:li:dataType:datahub.string",
"allowedValues": [
{
"value": {"string": "foo"},
"description": "test foo value"
},
{
"value": {"string": "bar"},
"description": "test bar value"
}
],
"cardinality": "SINGLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
],
"description": "test description"
}' | jq
```
### Read Property Definition
Example Request:
```shell
curl -X 'GET' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/propertyDefinition' \
-H 'accept: application/json' | jq
```
Example Response:
```json
{
"value": {
"allowedValues": [
{
"value": {
"string": "foo"
},
"description": "test foo value"
},
{
"value": {
"string": "bar"
},
"description": "test bar value"
}
],
"qualifiedName": "my.test.MyProperty01",
"displayName": "MyProperty01",
"valueType": "urn:li:dataType:datahub.string",
"description": "test description",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
],
"cardinality": "SINGLE"
}
}
```
### Delete Property Definition
There are two types of deletion present in DataHub: `hard` and `soft` delete. As of the current release only the `soft` delete
is supported for Structured Properties. See the subsections below for more details.
#### Soft Delete
A `soft` deleted Structured Property does not remove any underlying data on the Structured Property entity
or the Structured Property's values written to other entities. The `soft` delete is 100% reversible with zero data loss.
When a Structured Property is `soft` deleted, a few operations are not available.
Structured Property Soft Delete Effects:
* Entities with a `soft` deleted Structured Property value will not return the `soft` deleted properties
* Updates to a `soft` deleted Structured Property's definition are denied
* Adding a `soft` deleted Structured Property's value to an entity is denied
* Search filters using a `soft` deleted Structured Property will be denied
The following command will `soft` delete the test property `MyProperty01` created in this guide by writing
to the `status` aspect.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": true
}' | jq
```
Removing the `soft` delete from the Structured Property can be done by either `hard` deleting the `status` aspect or
changing the `removed` boolean to `false.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": false
}' | jq
```
#### Hard Delete
**Not Implemented**
## Applying Structured Properties
Structured Properties can now be added to entities which have the `structuredProperties` as aspect. In the following
example we'll attach and remove properties to an example dataset entity with urn `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
### Set Structured Property Values
This will set/replace all structured properties on the entity. See `PATCH` operations to add/remove a single property.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01",
"values": [
{"string": "foo"}
]
}
]
}' | jq
```
### Patch Structured Property Value
For this example, we'll extend create a second structured property and apply both properties to the same
dataset used previously. After this your system should include both `my.test.MyProperty01` and `my.test.MyProperty02`.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty02/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "my.test.MyProperty02",
"displayName": "MyProperty02",
"valueType": "urn:li:dataType:datahub.string",
"allowedValues": [
{
"value": {"string": "foo2"},
"description": "test foo2 value"
},
{
"value": {"string": "bar2"},
"description": "test bar2 value"
}
],
"cardinality": "SINGLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
]
}' | jq
```
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01",
"values": [
{"string": "foo"}
]
},
{
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02",
"values": [
{"string": "bar2"}
]
}
]
}' | jq
```
#### Remove Structured Property Value
The expected state of our test dataset include 2 structured properties. We'd like to remove the first one and preserve
the second property.
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "remove",
"path": "/properties/urn:li:structuredProperty:my.test.MyProperty01"
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
The response will show that the expected property has been removed.
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02"
}
]
}
}
}
}
```
#### Add Structured Property Value
In this example, we'll add the property back with a different value, preserving the existing property.
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "add",
"path": "/properties/urn:li:structuredProperty:my.test.MyProperty01",
"value": {
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01",
"values": [
{
"string": "bar"
}
]
}
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
The response shows that the property was re-added with the new value `bar` instead of the previous value `foo`.
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02"
},
{
"values": [
{
"string": "bar"
}
],
"propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01"
}
]
}
}
}
}
```

148
docs/api/tutorials/forms.md Normal file
View File

@ -0,0 +1,148 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Documentation Forms
## Why Would You Use Documentation Forms?
Documentation Forms are a way for end-users to fill out all mandatory attributes associated with a data asset. The form will be dynamically generated based on the definitions provided by administrators and stewards and matching rules.
Learn more about forms in the [Documentation Forms Feature Guide](../../../docs/features/feature-guides/documentation-forms.md).
### Goal Of This Guide
This guide will show you how to create and read forms.
## Prerequisites
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md).
<Tabs>
<TabItem value="CLI" label="CLI">
Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding SaaS release version is `v0.2.16.5`
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):
1. Run `datahub init` to update the instance you want to load into
2. Set the server to your sandbox instance, `https://{your-instance-address}/gms`
3. Set the token to your access token
</TabItem>
</Tabs>
## Create a Form
<Tabs>
<TabItem value="CLI" label="CLI">
Create a yaml file representing the forms youd like to load.
For example, below file represents a form `123456` You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/forms/forms.yaml).
```yaml
- id: 123456
# urn: "urn:li:form:123456" # optional if id is provided
type: VERIFICATION # Supported Types: DOCUMENTATION, VERIFICATION
name: "Metadata Initiative 2023"
description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out"
prompts:
- id: "123"
title: "Retention Time"
description: "Apply Retention Time structured property to form"
type: STRUCTURED_PROPERTY
structured_property_id: io.acryl.privacy.retentionTime
required: True # optional, will default to True
entities: # Either pass a list of urns or a group of filters. This example shows a list of urns
urns:
- urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
# optionally assign the form to a specific set of users and/or groups
# when omitted, form will be assigned to Asset owners
actors:
users:
- urn:li:corpuser:jane@email.com # note: these should be urns
- urn:li:corpuser:john@email.com
groups:
- urn:li:corpGroup:team@email.com # note: these should be urns
```
:::note
Note that the structured properties and related entities should be created before you create the form.
Please refer to the [Structured Properties Tutorial](/docs/api/tutorials/structured-properties.md) for more information.
:::
You can apply forms to either a list of entity urns, or a list of filters. For a list of entity urns, use this structure:
```
entities:
urns:
- urn:li:dataset:...
```
For a list of filters, use this structure:
```
entities:
filters:
types:
- dataset # you can use entity type name or urn
platforms:
- snowflake # you can use platform name or urn
domains:
- urn:li:domain:finance # you must use domain urn
containers:
- urn:li:container:my_container # you must use container urn
```
Note that you can filter to entity types, platforms, domains, and/or containers.
Use the CLI to create your properties:
```commandline
datahub forms upsert -f {forms_yaml}
```
If successful, you should see `Created form urn:li:form:...`
</TabItem>
</Tabs>
## Read Property Definition
<Tabs>
<TabItem value="CLI" label="CLI">
You can see the properties you created by running the following command:
```commandline
datahub forms get --urn {urn}
```
For example, you can run `datahub forms get --urn urn:li:form:123456`.
If successful, you should see metadata about your form returned like below.
```json
{
"urn": "urn:li:form:123456",
"name": "Metadata Initiative 2023",
"description": "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out",
"prompts": [
{
"id": "123",
"title": "Retention Time",
"description": "Apply Retention Time structured property to form",
"type": "STRUCTURED_PROPERTY",
"structured_property_urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
],
"type": "VERIFICATION"
}
```
</TabItem>
</Tabs>

View File

@ -0,0 +1,567 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Structured Properties
## Why Would You Use Structured Properties?
Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
Structured properties have values that are types. Conceptually, they are like “field definitions”.
Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties.md).
### Goal Of This Guide
This guide will show you how to execute the following actions with structured properties.
- Create structured properties
- Read structured properties
- Delete structured properties (soft delete)
- Add structured properties to a dataset
- Patch structured properties (add / remove / update a single property)
## Prerequisites
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md).
Additionally, you need to have the following tools installed according to the method you choose to interact with DataHub:
<Tabs>
<TabItem value="CLI" label="CLI" default>
Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding SaaS release version is `v0.2.16.5`
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):
- Run `datahub init` to update the instance you want to load into.
- Set the server to your sandbox instance, `https://{your-instance-address}/gms`.
- Set the token to your access token.
</TabItem>
<TabItem value="OpenAPI" label="OpenAPI">
Requirements for OpenAPI are:
* curl
* jq
</TabItem>
</Tabs>
## Create Structured Properties
The following code will create a structured property `io.acryl.privacy.retentionTime`.
<Tabs>
<TabItem value="CLI" label="CLI" default>
Create a yaml file representing the properties youd like to load.
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
```yaml
- id: io.acryl.privacy.retentionTime
# - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided
qualified_name: io.acryl.privacy.retentionTime # required if urn is provided
type: number
cardinality: MULTIPLE
display_name: Retention Time
entity_types:
- dataset # or urn:li:entityType:datahub.dataset
- dataFlow
description: "Retention Time is used to figure out how long to retain records in a dataset"
allowed_values:
- value: 30
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
- value: 90
description: Use this for datasets that drive monthly reporting but contain pii
- value: 365
description: Use this for non-sensitive data that can be retained for longer
```
Use the CLI to create your properties:
```commandline
datahub properties upsert -f {properties_yaml}
```
If successful, you should see `Created structured property urn:li:structuredProperty:...`
</TabItem>
<TabItem value="OpenAPI" label="OpenAPI">
```commandline
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"displayName": "Retention Time",
"cardinality": "MULTIPLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"allowedValues": [
{
"value": {"double": 30},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {"double": 60},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {"double": 365},
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}' | jq
```
</TabItem>
</Tabs>
## Read Structured Properties
You can see the properties you created by running the following command:
<Tabs>
<TabItem value="CLI" label="CLI" default>
```commandline
datahub properties get --urn {urn}
```
For example, you can run `datahub properties get --urn urn:li:structuredProperty:io.acryl.privacy.retentionTime`.
If successful, you should see metadata about your properties returned.
```commandline
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"qualified_name": "io.acryl.privacy.retentionTime",
"type": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"display_name": "Retention Time",
"entity_types": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE",
"allowed_values": [
{
"value": "30",
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": "90",
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": "365",
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}
```
</TabItem>
<TabItem value="OpenAPI" label="OpenAPI">
Example Request:
```
curl -X 'GET' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' | jq
```
Example Response:
```commandline
{
"value": {
"allowedValues": [
{
"value": {
"double": 30.0
},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {
"double": 60.0
},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {
"double": 365.0
},
"description": "Use this for non-sensitive data that can be retained for longer"
}
],
"qualifiedName": "io.acryl.privacy.retentionTime",
"displayName": "Retention Time",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE"
}
}
```
</TabItem>
</Tabs>
## Set Structured Property To a Dataset
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.
<Tabs>
<TabItem value="CLI" label="CLI" default>
You can set structured properties to a dataset by creating a dataset yaml file with structured properties. For example, below is a dataset yaml file with structured properties in both the field and dataset level.
Please refer to the [full example here.](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/datasets.yaml)
```yaml
- id: user_clicks_snowflake
platform: snowflake
schema:
fields:
- id: user_id
structured_properties:
io.acryl.dataManagement.deprecationDate: "2023-01-01"
structured_properties:
io.acryl.dataManagement.replicationSLA: 90
```
Use the CLI to upsert your dataset yaml file:
```commandline
datahub dataset upsert -f {dataset_yaml}
```
If successful, you should see `Update succeeded for urn:li:dataset:...`
</TabItem>
<TabItem value="OpenAPI" label="OpenAPI">
Following command will set structured properties `retentionTime` as `90` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`)
```commandline
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"string": "90"}
]
}
]
}' | jq
```
</TabItem>
</Tabs>
#### Expected Outcomes
Once your datasets are uploaded, you can view them in the UI and view the properties associated with them under the Properties tab.
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-set.png"/>
</p>
Or you can run the following command to view the properties associated with the dataset:
```commandline
datahub dataset get --urn {urn}
```
## Patch Structured Property Value
This section will show you how to patch a structured property value - either by removing, adding, or upserting a single property.
### Add Structured Property Value
For this example, we'll extend create a second structured property and apply both properties to the same dataset used previously.
After this your system should include both `io.acryl.privacy.retentionTime` and `io.acryl.privacy.retentionTime02`.
<Tabs>
<TabItem value="OpenAPI" label="OpenAPI">
Let's start by creating the second structured property.
```
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "io.acryl.privacy.retentionTime02",
"displayName": "Retention Time 02",
"valueType": "urn:li:dataType:datahub.string",
"allowedValues": [
{
"value": {"string": "foo2"},
"description": "test foo2 value"
},
{
"value": {"string": "bar2"},
"description": "test bar2 value"
}
],
"cardinality": "SINGLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
]
}' | jq
```
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`
Specically, this will set `io.acryl.privacy.retentionTime` as `90` and `io.acryl.privacy.retentionTime02` as `bar2`.
```
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"string": "90"}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{"string": "bar2"}
]
}
]
}' | jq
```
</TabItem>
</Tabs>
#### Expected Outcomes
You can see that the dataset now has two structured properties attached to it.
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-add.png"/>
</p>
### Remove Structured Property Value
The expected state of our test dataset include 2 structured properties.
We'd like to remove the first one (`io.acryl.privacy.retentionTime`) and preserve the second property. (`io.acryl.privacy.retentionTime02`).
<Tabs>
<TabItem value="OpenAPI" label="OpenAPI">
```
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "remove",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
The response will show that the expected property has been removed.
```
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
}
]
}
}
}
}
```
</TabItem>
</Tabs>
#### Expected Outcomes
You can see that the first property has been removed and the second property is still present.
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-remove.png"/>
</p>
### Upsert Structured Property Value
In this example, we'll add the property back with a different value, preserving the existing property.
<Tabs>
<TabItem value="OpenAPI" label="OpenAPI">
```
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "add",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"value": {
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"string": "365"
}
]
}
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
Below is the expected response:
```
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
},
{
"values": [
{
"string": "365"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
]
}
}
}
}
```
The response shows that the property was re-added with the new value bar instead of the previous value foo.
</TabItem>
</Tabs>
#### Expected Outcomes
You can see that the first property has been added back with a new value and the second property is still present.
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-upsert.png"/>
</p>
## Delete Structured Properties
There are two types of deletion present in DataHub: hard and soft delete. As of the current release only the soft delete is supported for Structured Properties.
:::note SOFT DELETE
A soft deleted Structured Property does not remove any underlying data on the Structured Property entity or the Structured Property's values written to other entities. The soft delete is 100% reversible with zero data loss. When a Structured Property is soft deleted, a few operations are not available.
Structured Property Soft Delete Effects:
- Entities with a soft deleted Structured Property value will not return the soft deleted properties
- Updates to a soft deleted Structured Property's definition are denied
- Adding a soft deleted Structured Property's value to an entity is denied
- Search filters using a soft deleted Structured Property will be denied
:::
<Tabs>
<TabItem value="CLI" label="CLI (Soft Delete)" default>
The following command will soft delete the test property.
```commandline
datahub delete --urn {urn}
```
</TabItem>
<TabItem value="OpenAPI" label="OpenAPI (Soft Delete)">
The following command will soft delete the test property by writing to the status aspect.
```
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": true
}' | jq
```
If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below.
```
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": false
}' | jq
```
</TabItem>
</Tabs>

View File

@ -0,0 +1,113 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';
# About DataHub Documentation Forms
<FeatureAvailability/>
DataHub Documentation Forms streamline the process of setting documentation requirements and delegating annotation responsibilities to the relevant data asset owners, stewards, and subject matter experts.
Forms are highly configurable, making it easy to ask the right questions of the right people, for a specific set of assets.
## What are Documentation Forms?
You can think of Documentation Forms as a survey for your data assets: a set of questions that must be answered in order for an asset to be considered properly documented.
Verification Forms are an extension of Documentation Forms, requiring a final verification, or sign-off, on all responses before the asset can be considered Verified. This is useful for compliance and/or governance annotation initiatives where you want assignees to provide a final acknowledgement that the information provided is correct.
## Creating and Assigning Documentation Forms
Documentation Forms are defined via YAML with the following details:
- Name and Description to help end-users understand the scope and use case
- Form Type, either Documentation or Verification
- Verification Forms require a final signoff, i.e. Verification, of all required questions before the Form can be considered complete
- Form Questions (aka "prompts") for end-users to complete
- Questions can be assigned at the asset-level and/or the field-level
- Asset-level questions can be configured to be required; by default, all questions are optional
- Assigned Assets, defined by:
- A set of specific asset URNs, OR
- Assets related to a set of filters, such as Type (Datasets, Dashboards, etc.), Platform (Snowflake, Looker, etc.), Domain (Product, Marketing, etc.), or Container (Schema, Folder, etc.)
- Optional: Form Assignees
- Optionally assign specific DataHub users/groups to complete the Form for all relevant assets
- If omitted, any Owner of an Asset can complete Forms assigned to that Asset
Here's an example of defining a Documentation Form via YAML:
```yaml
- id: 123456
# urn: "urn:li:form:123456" # optional if id is provided
type: VERIFICATION # Supported Types: DOCUMENTATION, VERIFICATION
name: "Metadata Initiative 2024"
description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out"
prompts: # Questions for Form assignees to complete
- id: "123"
title: "Data Retention Time"
description: "Apply Retention Time structured property to form"
type: STRUCTURED_PROPERTY
structured_property_id: io.acryl.privacy.retentionTime
required: True # optional; default value is False
entities: # Either pass a list of urns or a group of filters. This example shows a list of urns
urns:
- urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)
# optionally assign the form to a specific set of users and/or groups
# when omitted, form will be assigned to Asset owners
actors:
users:
- urn:li:corpuser:jane@email.com # note: these should be URNs
- urn:li:corpuser:john@email.com
groups:
- urn:li:corpGroup:team@email.com # note: these should be URNs
```
:::note
Documentation Forms currently only support defining Structured Properties as Form Questions
:::
<!-- ## Completing Documentation Forms -->
<!-- Plain-language instructions of how to use the feature
Provide a step-by-step guide to use feature, including relevant screenshots and/or GIFs
* Where/how do you access it?
* What best practices exist?
* What are common code snippets?
-->
## Additional Resources
### Videos
**Asset Verification in Acryl Cloud**
<p align="center">
<iframe width="560" height="315" src="https://www.loom.com/embed/dd834d3cb8f041fca001cea19b2b4071?sid=7073dcd4-407c-41ec-b41d-c99f26dd6a2f" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</p>
## FAQ and Troubleshooting
**What is the difference between Documentation and Verification Forms?**
Both form types are a way to configure a set of optional and/or required questions for DataHub users to complete. When using Verification Forms, users will be presented with a final verification step once all required questions have been completed; you can think of this as a final acknowledgement of the accuracy of information submitted.
**Who is able to complete Forms in DataHub?**
By default, any owner of an Asset will be able to respond to questions assigned via a Form.
When assigning a Form to an Asset, you can optionally assign specific DataHub users/groups to fill them out.
**Can I assign multiple Forms to a single asset?**
You sure can! Please keep in mind that an Asset will only be considered Documented or Verified if all required questions are completed on all assiged Forms.
### API Tutorials
- [Create a Documentation Form](../../../docs/api/tutorials/forms.md)
:::note
You must create a Structured Property before including it in a Documentation Form.
To learn more about creating Structured Properties via CLI, please see the [Create Structured Properties](/docs/api/tutorials/structured-properties.md) tutorial.
:::
### Related Features
- [DataHub Properties](/docs/features/feature-guides/properties.md)

View File

@ -0,0 +1,158 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# About DataHub Properties
<FeatureAvailability/>
DataHub Custom Properties and Structured Properties are powerful tools to collect meaningful metadata for Assets that might not perfectly fit into other Aspects within DataHub, such as Glossary Terms, Tags, etc. Both types can be found in an Asset's Properties tab:
<p align="center">
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/properties/custom_and_structured_properties.png"/>
</p>
This guide will explain the differences and use cases of each property type.
## What are Custom Properties and Structured Properties?
Here are the differences between the two property types at a glance:
| Custom Properties | Structured Properties |
| --- | --- |
| Map of key-value pairs stored as strings | Validated namespaces and data types |
| Added to assets during ingestion and via API | Defined via YAML; created and added to assets via CLI |
| No support for UI-based Edits | Support for UI-based edits |
**Custom Properties** are key-value pairs of strings that capture additional information about assets that is not readily available in standard metadata fields. Custom Properties can be added to assets automatically during ingestion or programmatically via API and *cannot* be edited via the UI.
<p align="center">
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/properties/custom_properties_highlight.png"/>
</p>
<p align="center"><em>Example of Custom Properties assigned to a Dataset</em></p>
**Structured Properties** are an extension of Custom Properties, providing a structured and validated way to attach metadata to DataHub Assets. Available as of v0.13.1, Structured Properties have a pre-defined type (Date, Integer, URN, String, etc.). They can be configured to only accept a specific set of allowed values, making it easier to ensure high levels of data quality and consistency. Structured Properties are defined via YAML, added to assets via CLI, and can be edited via the UI.
<p align="center">
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/properties/structured_properties_highlight.png"/>
</p>
<p align="center"><em>Example of Structured Properties assigned to a Dataset</em></p>
## Use Cases for Custom Properties and Structured Properties
**Custom Properties** are useful for capturing raw metadata from source systems during ingestion or programmatically via API. Some examples include:
- GitHub file location of code which generated a dataset
- Data encoding type
- Account ID, cluster size, and region where a dataset is stored
**Structured Properties** are useful for setting and enforcing standards of metadata collection, particularly in support of compliance and governance initiatives. Values can be added programmatically via API, then manually via the DataHub UI as necessary. Some examples include:
- Deprecation Date
- Type: Date, Single Select
- Validation: Must be formatted as 'YYYY-MM-DD'
- Data Retention Period
- Type: String, Single Select
- Validation: Adheres to allowed values "30 Days", "90 Days", "365 Days", or "Indefinite"
- Consulted Compliance Officer, chosen from a list of DataHub users
- Type: DataHub User, Multi-Select
- Validation: Must be valid DataHub User URN
By using Structured Properties, compliance and governance officers can ensure consistency in data collection across assets.
## Creating, Assigning, and Editing Structured Properties
Structured Properties are defined via YAML, then created and assigned to DataHub Assets via the DataHub CLI.
Here's how we would define the above examples in YAML:
<Tabs>
<TabItem value="deprecationDate" label="Deprecation Date" default>
```yaml
- id: deprecation_date
qualified_name: deprecation_date
type: date # Supported types: date, string, number, urn, rich_text
cardinality: SINGLE # Supported options: SINGLE, MULTIPLE
display_name: Deprecation Date
description: "Scheduled date when resource will be deprecated in the source system"
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
```
</TabItem>
<TabItem value="dataRetentionPeriod" label="Data Retention Period">
```yaml
- id: retention_period
qualified_name: retention_period
type: string # Supported types: date, string, number, urn, rich_text
cardinality: SINGLE # Supported options: SINGLE, MULTIPLE
display_name: Data Retention Period
description: "Predetermined storage duration before being deleted or archived
based on legal, regulatory, or organizational requirements"
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
allowed_values:
- value: "30 Days"
description: "Use this for datasets that are ephemeral and contain PII"
- value: "90 Days"
description: "Use this for datasets that drive monthly reporting but contain PII"
- value: "365 Days"
description: "Use this for non-sensitive data that can be retained for longer"
- value: "Indefinite"
description: "Use this for non-sensitive data that can be retained indefinitely"
```
</TabItem>
<TabItem value="consultedComplianceOfficer" label="Consulted Compliance Officer(s)">
```yaml
- id: compliance_officer
qualified_name: compliance_officer
type: urn # Supported types: date, string, number, urn, rich_text
cardinality: MULTIPLE # Supported options: SINGLE, MULTIPLE
display_name: Consulted Compliance Officer(s)
description: "Member(s) of the Compliance Team consulted/informed during audit"
type_qualifier: # Define the type of Asset URNs to allow
- corpuser
- corpGroup
entity_types: # Define which types of DataHub Assets the Property can be assigned to
- dataset
```
</TabItem>
</Tabs>
:::note
To learn more about creating and assigning Structured Properties via CLI, please see the [Create Structured Properties](/docs/api/tutorials/structured-properties.md) tutorial.
:::
Once a Structured Property is assigned to an Asset, Users with the `Edit Properties` Metadata Privilege will be able to change Structured Property values via the DataHub UI.
<p align="center">
<img width="80%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/properties/edit_structured_properties_modal.png"/>
</p>
<p align="center"><em>Example of editing the value of a Structured Property via the UI</em></p>
### Videos
**Deep Dive: UI-Editable Properties**
<p align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/06zaQyKxJYk?si=H_YiwQty25m2xzaP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
</p>
### API
Please see the following API guides related to Custom and Structured Properties:
- [Custom Properties API Guide](/docs/api/tutorials/structured-properties.md)
- [Structured Properties API Guide](/docs/api/tutorials/structured-properties.md)
## FAQ and Troubleshooting
**Why can't I edit the value of a Structured Property from the DataHub UI?**
1. Your version of DataHub does not support UI-based edits of Structured Properties. Confirm you are running DataHub v0.13.1 or later.
2. You are attempting to edit a Custom Property, not a Structured Property. Confirm you are trying to edit a Structured Property, which will have an "Edit" button visible. Please note that Custom Properties are not eligible for UI-based edits to minimize overwrites during recurring ingestion.
3. You do not have the necessary privileges. Confirm with your Admin that you have the `Edit Properties` Metadata Privilege.
### Related Features
- [Documentation Forms](/docs/features/feature-guides/documentation-forms.md)