import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Structured Properties
## Why Would You Use Structured Properties?
Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
Structured properties have values that are types. Conceptually, they are like “field definitions”.
Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties.md).
### Goal Of This Guide
This guide will show you how to execute the following actions with structured properties.
- Create structured properties
- Read structured properties
- Delete structured properties
- Add structured properties to a dataset
- Patch structured properties (add / remove / update a single property)
- Update structured property with breaking schema changes
- Search & aggregations using structured properties
## Prerequisites
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md).
Additionally, you need to have the following tools installed according to the method you choose to interact with DataHub:
Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding DataHub Cloud release version is `v0.2.16.5`
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):
- Run `datahub init` to update the instance you want to load into.
- Set the server to your sandbox instance, `https://{your-instance-address}/gms`.
- Set the token to your access token.
Requirements for OpenAPI are:
* curl
* jq
## Create Structured Properties
The following code will create a structured property `io.acryl.privacy.retentionTime`.
```graphql
mutation createStructuredProperty {
createStructuredProperty(
input: {
id: "retentionTime",
qualifiedName:"retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
valueType: "urn:li:dataType:datahub.number",
allowedValues: [
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
],
cardinality: SINGLE,
entityTypes: ["urn:li:entityType:dataset", "urn:li:entityType:dataFlow"],
}
) {
urn
}
}
```
Create a yaml file representing the properties you’d like to load.
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
```yaml
- id: io.acryl.privacy.retentionTime
# - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided
qualified_name: io.acryl.privacy.retentionTime # required if urn is provided
type: number
cardinality: MULTIPLE
display_name: Retention Time
entity_types:
- dataset # or urn:li:entityType:datahub.dataset
- dataFlow
description: "Retention Time is used to figure out how long to retain records in a dataset"
allowed_values:
- value: 30
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
- value: 90
description: Use this for datasets that drive monthly reporting but contain pii
- value: 365
description: Use this for non-sensitive data that can be retained for longer
```
Use the CLI to create your properties:
```commandline
datahub properties upsert -f {properties_yaml}
```
If successful, you should see `Created structured property urn:li:structuredProperty:...`
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"displayName": "Retention Time",
"cardinality": "MULTIPLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"allowedValues": [
{
"value": {"double": 30},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {"double": 60},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {"double": 365},
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}' | jq
```
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"displayName": "Retention Time",
"cardinality": "MULTIPLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"allowedValues": [
{
"value": {
"double": 30
},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {
"double": 60
},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {
"double": 365
},
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"propertyDefinition": {
"value": {
"allowedValues": [
{
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
"value": {
"double": 30
}
},
{
"description": "Use this for datasets that drive monthly reporting but contain pii",
"value": {
"double": 60
}
},
{
"description": "Use this for non-sensitive data that can be retained for longer",
"value": {
"double": 365
}
}
],
"displayName": "Retention Time",
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE"
}
}
}
```
## Read Structured Properties
You can see the properties you created by running the following command:
```commandline
datahub properties get --urn {urn}
```
For example, you can run `datahub properties get --urn urn:li:structuredProperty:io.acryl.privacy.retentionTime`.
If successful, you should see metadata about your properties returned.
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"qualified_name": "io.acryl.privacy.retentionTime",
"type": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"display_name": "Retention Time",
"entity_types": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE",
"allowed_values": [
{
"value": "30",
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": "90",
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": "365",
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}
```
Example Request:
```
curl -X 'GET' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' | jq
```
Example Response:
```json
{
"value": {
"allowedValues": [
{
"value": {
"double": 30.0
},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {
"double": 60.0
},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {
"double": 365.0
},
"description": "Use this for non-sensitive data that can be retained for longer"
}
],
"qualifiedName": "io.acryl.privacy.retentionTime",
"displayName": "Retention Time",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE"
}
}
```
Example Request:
```
curl -X 'GET' -v \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
-H 'accept: application/json' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"propertyDefinition": {
"value": {
"allowedValues": [
{
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
"value": {
"double": 30
}
},
{
"description": "Use this for datasets that drive monthly reporting but contain pii",
"value": {
"double": 60
}
},
{
"description": "Use this for non-sensitive data that can be retained for longer",
"value": {
"double": 365
}
}
],
"displayName": "Retention Time",
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE"
}
}
}
```
## Set Structured Property To a Dataset
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.
```graphql
mutation upsertStructuredProperties {
upsertStructuredProperties(
input: {
assetUrn: "urn:li:mydataset1",
structuredPropertyInputParams: [
{
structuredPropertyUrn: "urn:li:structuredProperty:mystructuredproperty",
values: [
{
stringValue: "123"
}
]
}
]
}
) {
properties {
structuredProperty {
urn
}
}
}
}
```
You can set structured properties to a dataset by creating a dataset yaml file with structured properties. For example, below is a dataset yaml file with structured properties in both the field and dataset level.
Please refer to the [full example here.](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/datasets.yaml)
```yaml
- id: user_clicks_snowflake
platform: snowflake
schema:
fields:
- id: user_id
structured_properties:
io.acryl.dataManagement.deprecationDate: "2023-01-01"
structured_properties:
io.acryl.dataManagement.replicationSLA: 90
```
Use the CLI to upsert your dataset yaml file:
```commandline
datahub dataset upsert -f {dataset_yaml}
```
If successful, you should see `Update succeeded for urn:li:dataset:...`
Following command will set structured properties `retentionTime` as `60.0` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`)
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"double": 60.0}
]
}
]
}' | jq
```
Following command will set structured properties `retentionTime` as `60.0` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`)
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"double": 60.0}
]
}
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"double": 60
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
]
}
}
}
```
#### Expected Outcomes
Once your datasets are uploaded, you can view them in the UI and view the properties associated with them under the Properties tab.
Or you can run the following command to view the properties associated with the dataset:
```commandline
datahub dataset get --urn {urn}
```
## Read Structured Properties From a Dataset
For reading all structured properties from a dataset:
```graphql
query getDataset {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.ecommerce.customer,PROD)") {
structuredProperties {
properties {
structuredProperty {
urn
type
definition {
displayName
description
allowedValues {
description
}
}
}
values {
... on StringValue {
stringValue
}
... on NumberValue {
numberValue
}
}
valueEntities {
urn
type
}
}
}
}
}
```
## Remove Structured Properties From a Dataset
For removing a structured property or list of structured properties from a dataset:
```graphql
mutation removeStructuredProperties {
removeStructuredProperties(
input: {
assetUrn: "urn:li:mydataset1",
structuredPropertyUrns: ["urn:li:structuredProperty:mystructuredproperty"]
}
) {
properties {
structuredProperty {urn}
}
}
}
```
## Patch Structured Property Value
This section will show you how to patch a structured property value - either by removing, adding, or upserting a single property.
### Add Structured Property Value
For this example, we'll extend create a second structured property and apply both properties to the same dataset used previously.
After this your system should include both `io.acryl.privacy.retentionTime` and `io.acryl.privacy.retentionTime02`.
Let's start by creating the second structured property.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"qualifiedName": "io.acryl.privacy.retentionTime02",
"displayName": "Retention Time 02",
"valueType": "urn:li:dataType:datahub.string",
"allowedValues": [
{
"value": {"string": "foo2"},
"description": "test foo2 value"
},
{
"value": {"string": "bar2"},
"description": "test bar2 value"
}
],
"cardinality": "SINGLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
]
}' | jq
```
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`
Specifically, this will set `io.acryl.privacy.retentionTime` as `60.0` and `io.acryl.privacy.retentionTime02` as `bar2`.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"double": 60.0}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{"string": "bar2"}
]
}
]
}' | jq
```
Let's start by creating the second structured property.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"qualifiedName": "io.acryl.privacy.retentionTime02",
"displayName": "Retention Time 02",
"valueType": "urn:li:dataType:datahub.string",
"allowedValues": [
{
"value": {"string": "foo2"},
"description": "test foo2 value"
},
{
"value": {"string": "bar2"},
"description": "test bar2 value"
}
],
"cardinality": "SINGLE",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"propertyDefinition": {
"value": {
"allowedValues": [
{
"value": {
"string": "foo2"
},
"description": "test foo2 value"
},
{
"value": {
"string": "bar2"
},
"description": "test bar2 value"
}
],
"entityTypes": [
"urn:li:entityType:datahub.dataset"
],
"qualifiedName": "io.acryl.privacy.retentionTime02",
"displayName": "Retention Time 02",
"cardinality": "SINGLE",
"valueType": "urn:li:dataType:datahub.string"
}
}
}
```
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`
Specically, this will set `io.acryl.privacy.retentionTime` as `60.0` and `io.acryl.privacy.retentionTime02` as `bar2`.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties?createIfNotExists=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{"double": 60.0}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{"string": "bar2"}
]
}
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"double": 60
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
},
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
}
]
}
}
}
```
#### Expected Outcomes
You can see that the dataset now has two structured properties attached to it.
### Remove Structured Property Value
The expected state of our test dataset include 2 structured properties.
We'd like to remove the first one (`io.acryl.privacy.retentionTime`) and preserve the second property. (`io.acryl.privacy.retentionTime02`).
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "remove",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
The response will show that the expected property has been removed.
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
}
]
}
}
}
}
```
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "remove",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
The response will show that the expected property has been removed.
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
}
]
}
}
}
```
#### Expected Outcomes
You can see that the first property has been removed and the second property is still present.
### Upsert Structured Property Value
In this example, we'll add the property back with a different value, preserving the existing property.
```graphql
mutation updateStructuredProperty {
updateStructuredProperty(
input: {
urn: "urn:li:structuredProperty:retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
newAllowedValues: [
{
numberValue: 30,
description: "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
numberValue: 90,
description: "Use this for datasets that drive monthly reporting but contain pii"
},
{
numberValue: 365,
description: "Use this for non-sensitive data that can be retained for longer"
}
]
}
) {
urn
}
}
```
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "add",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"value": {
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"double": 365.0
}
]
}
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"aspects": {
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
},
{
"values": [
{
"double": 365.0
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
]
}
}
}
}
```
The response shows that the property was re-added with the new value 365.0 instead of the previous value 60.0.
```shell
curl -X 'PATCH' -v \
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
-H 'accept: application/json' \
-H 'Content-Type: application/json-patch+json' \
-d '{
"patch": [
{
"op": "add",
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"value": {
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"double": 365.0
}
]
}
}
],
"arrayPrimaryKeys": {
"properties": [
"propertyUrn"
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"structuredProperties": {
"value": {
"properties": [
{
"values": [
{
"string": "bar2"
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
},
{
"values": [
{
"double": 365
}
],
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
}
]
}
}
}
```
The response shows that the property was re-added with the new value 365 instead of the previous value 60.
#### Expected Outcomes
You can see that the first property has been added back with a new value and the second property is still present.
## Delete Structured Properties
There are two types of deletion present in DataHub: hard and soft delete.
:::note SOFT DELETE
A soft deleted Structured Property does not remove any underlying data on the Structured Property entity or the Structured Property's values written to other entities.
The soft delete is 100% reversible with zero data loss. When a Structured Property is soft deleted, a few operations are not available.
Structured Property Soft Delete Effects:
- Entities with a soft deleted Structured Property value will not return the soft deleted properties
- Updates to a soft deleted Structured Property's definition are denied
- Adding a soft deleted Structured Property's value to an entity is denied
- Search filters using a soft deleted Structured Property will be denied
:::
:::note HARD DELETE
A hard deleted Structured Property REMOVES all underlying data for the Structured Property entity and the Structured Property's values written to other entities.
The hard delete is NOT reversible.
Structured Property Hard Delete Effects:
- Structured Property entity is removed
- Structured Property values are removed via PATCH MCPs on their respective entities
- Rollback is not possible
- Elasticsearch index mappings will continue to contain references to the hard deleted property until reindex
:::
### Soft Delete
The following command will soft delete the test property.
```commandline
datahub delete --urn {urn}
```
The following command will soft delete the test property by writing to the status aspect.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": true
}' | jq
```
If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"removed": false
}' | jq
```
The following command will soft delete the test property by writing to the status aspect.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"removed": true
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"status": {
"value": {
"removed": true
}
}
}
```
If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below.
```shell
curl -X 'POST' \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false&createIfNotExists=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"removed": true
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"status": {
"value": {
"removed": false
}
}
}
```
### Hard Delete
The following command will hard delete the test property.
```commandline
datahub delete --urn {urn} --hard
```
The following command will hard delete the test property.
```shell
curl -v -X 'DELETE' \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime'
```
Example Response:
```text
> DELETE /openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 14 Jun 2024 17:30:27 GMT
< Content-Length: 0
< Server: Jetty(11.0.19)
```
#### Index Mappings Cleanup
After the asynchronous delete of all Structured Property values have been processed, triggered by the above
hard delete, it is possible to remove the remaining index mappings. Note that if even 1 Structured Property value remains
the mapping will not be removed for a given entity index.
Run the DataHub system-update job (automatically run with every helm upgrade or install and quickstart) with
the following environment variables enabled.
This will trigger an ES index which will take time to complete. During the process the entire index is recreated.
```shell
ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX=true
ENABLE_STRUCTURED_PROPERTIES_SYSTEM_UPDATE=true
```
## Update Structured Property With Breaking Schema Changes
This section will demonstrate how to make backwards incompatible schema changes. Making backwards incompatible
schema changes will remove previously written data.
Breaking schema changes are implemented by setting a version string within the Structured Property definition. This
version must be in the following format: `yyyyMMddhhmmss`, i.e. `20240614080000`
:::note IMPORTANT NOTES
Old values will not be retrieve-able after the new Structured Property definition is applied.
The old values will be subject to deletion asynchronously (future work).
:::
In the following example, we'll revisit the `retentionTime` structured property and apply a breaking change
by changing the cardinality from `MULTIPLE` to `SINGLE`. Normally this change would be rejected as a
backwards incompatible change since values that were previously written may have multiple values written
which would no longer be valid.
Edit the previously created definition yaml: Change the cardinality to `SINGLE` and add a `version`.
```yaml
- id: io.acryl.privacy.retentionTime
# - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided
qualified_name: io.acryl.privacy.retentionTime # required if urn is provided
type: number
cardinality: SINGLE
version: '20240614080000'
display_name: Retention Time
entity_types:
- dataset # or urn:li:entityType:datahub.dataset
- dataFlow
description: "Retention Time is used to figure out how long to retain records in a dataset"
allowed_values:
- value: 30
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
- value: 90
description: Use this for datasets that drive monthly reporting but contain pii
- value: 365
description: Use this for non-sensitive data that can be retained for longer
```
Use the CLI to create your properties:
```commandline
datahub properties upsert -f {properties_yaml}
```
If successful, you should see `Created structured property urn:li:structuredProperty:...`
Change the cardinality to `SINGLE` and add a `version`.
```shell
curl -X 'POST' -v \
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition?createIfNotExists=false' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"value": {
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"displayName": "Retention Time",
"cardinality": "SINGLE",
"version": "20240614080000",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"allowedValues": [
{
"value": {
"double": 30
},
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
},
{
"value": {
"double": 60
},
"description": "Use this for datasets that drive monthly reporting but contain pii"
},
{
"value": {
"double": 365
},
"description": "Use this for non-sensitive data that can be retained for longer"
}
]
}
}' | jq
```
Example Response:
```json
{
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"propertyDefinition": {
"value": {
"allowedValues": [
{
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
"value": {
"double": 30
}
},
{
"description": "Use this for datasets that drive monthly reporting but contain pii",
"value": {
"double": 60
}
},
{
"description": "Use this for non-sensitive data that can be retained for longer",
"value": {
"double": 365
}
}
],
"displayName": "Retention Time",
"qualifiedName": "io.acryl.privacy.retentionTime",
"valueType": "urn:li:dataType:datahub.number",
"description": "Retention Time is used to figure out how long to retain records in a dataset",
"entityTypes": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"version": "20240614080000",
"cardinality": "SINGLE"
}
}
}
```
## Structured Properties - Search & Aggregation
Currently Structured Properties can be used to filter search results. This currently excludes fulltext search.
The following examples re-use the two previously defined Structured Properties.
`io.acryl.privacy.retentionTime` - An example numeric property.
`io.acryl.privacy.retentionTime02` - An example string property.
Range Query:
Document should be returned based on the previously assigned value of 60.
```graphql
query {
scrollAcrossEntities(
input: {
types: DATASET,
count: 10,
query: "*",
orFilters: {
and: [
{
field: "structuredProperties.io.acryl.privacy.retentionTime",
condition: GREATER_THAN,
values: [
"45.0"
]
}
]
}
}
) {
searchResults {
entity {
urn,
type
}
}
}
}
```
Exists Query:
Document should be returned based on the previously assigned value.
```graphql
query {
scrollAcrossEntities(
input: {
types: DATASET,
count: 10,
query: "*",
orFilters: {
and: [
{
field: "structuredProperties.io.acryl.privacy.retentionTime",
condition: EXISTS
}
]
}
}
) {
searchResults {
entity {
urn,
type
}
}
}
}
```
Equality Query:
Document should be returned based on the previously assigned value of 'bar2'.
```graphql
query {
scrollAcrossEntities(
input: {
types: DATASET,
count: 10,
query: "*",
orFilters: {
and: [
{
field: "structuredProperties.io.acryl.privacy.retentionTime02",
condition: EQUAL
values: [
"bar2"
]
}
]
}
}
) {
searchResults {
entity {
urn,
type
}
}
}
}
```
Unlike GraphQL which has a parsed input object for filtering, OpenAPI only includes a structured query which
relies on the `query_string` syntax. See the Elasticsearch [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-query-string-query.html) for detailed syntax.
In order to use the `query_string` syntax we'll need to know a bit about the Structured Property's definition such
as whether it is versioned or un-unversioned and its type. This information will be added to the `query` url parameter.
Un-versioned Example:
Structured Property URN - `urn:li:structuredProperty:io.acryl.privacy.retentionTime`
Elasticsearch Field Name - `structuredProperties.io_acryl_privacy_retentionTime`
Versioned:
Structured Property Version - `20240614080000`
Structured Property Type - `string`
Structured Property URN - `urn:li:structuredProperty:io.acryl.privacy.retentionTime02`
Elasticsearch Field Name - `structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string`
Range Query:
query - `structuredProperties.io_acryl_privacy_retentionTime:>45`
```shell
curl -X 'GET' \
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=structuredProperties.io_acryl_privacy_retentionTime%3A%3E45' \
-H 'accept: application/json'
```
Example Response:
```json
{
"entities": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"datasetKey": {
"value": {
"name": "SampleHiveDataset",
"platform": "urn:li:dataPlatform:hive",
"origin": "PROD"
}
},
"structuredProperties": {
"value": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"double": 60
}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{
"string": "bar2"
}
]
}
]
}
}
}
]
}
```
Exists Query:
query - `_exists_:structuredProperties.io_acryl_privacy_retentionTime`
```shell
curl -X 'GET' \
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=_exists_%3AstructuredProperties.io_acryl_privacy_retentionTime' \
-H 'accept: application/json'
```
Example Response:
```json
{
"entities": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"datasetKey": {
"value": {
"name": "SampleHiveDataset",
"platform": "urn:li:dataPlatform:hive",
"origin": "PROD"
}
},
"structuredProperties": {
"value": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"double": 60
}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{
"string": "bar2"
}
]
}
]
}
}
}
]
}
```
Equality Query:
query - `structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string`
```shell
curl -X 'GET' \
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string' \
-H 'accept: application/json'
```
Example Response:
```json
{
"entities": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
"datasetKey": {
"value": {
"name": "SampleHiveDataset",
"platform": "urn:li:dataPlatform:hive",
"origin": "PROD"
}
},
"structuredProperties": {
"value": {
"properties": [
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
"values": [
{
"double": 60
}
]
},
{
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
"values": [
{
"string": "bar2"
}
]
}
]
}
}
}
]
}
```
### Structured Property Aggregations
Structured properties can also be used in GraphQL's aggregation queries using the same naming convention outlined above
for search filter field names. There are currently no aggregation endpoints for OpenAPI.
Aggregation Query:
```graphql
query {
aggregateAcrossEntities(
input: {
types: [],
facets: [
"structuredProperties.io.acryl.privacy.retentionTime02",
"structuredProperties.io.acryl.privacy.retentionTime"],
query: "*",
orFilters: [],
searchFlags: {maxAggValues: 100}
}) {
facets {
field
aggregations {
value
count
}
}
}
}
```
Example Response:
```json
{
"data": {
"aggregateAcrossEntities": {
"facets": [
{
"field": "structuredProperties.io.acryl.privacy.retentionTime02",
"aggregations": [
{
"value": "bar2",
"count": 1
}
]
},
{
"field": "structuredProperties.io.acryl.privacy.retentionTime",
"aggregations": [
{
"value": "60.0",
"count": 1
}
]
}
]
}
},
"extensions": {}
}
```