mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 07:34:44 +00:00
1781 lines
47 KiB
Markdown
1781 lines
47 KiB
Markdown
import Tabs from '@theme/Tabs';
|
||
import TabItem from '@theme/TabItem';
|
||
|
||
# Structured Properties
|
||
|
||
## Why Would You Use Structured Properties?
|
||
|
||
Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
|
||
Structured properties have values that are types. Conceptually, they are like “field definitions”.
|
||
|
||
Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties.md).
|
||
|
||
|
||
### Goal Of This Guide
|
||
|
||
This guide will show you how to execute the following actions with structured properties.
|
||
- Create structured properties
|
||
- Read structured properties
|
||
- Delete structured properties
|
||
- Add structured properties to a dataset
|
||
- Patch structured properties (add / remove / update a single property)
|
||
- Update structured property with breaking schema changes
|
||
- Search & aggregations using structured properties
|
||
|
||
## Prerequisites
|
||
|
||
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data.
|
||
For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md).
|
||
|
||
Additionally, you need to have the following tools installed according to the method you choose to interact with DataHub:
|
||
|
||
<Tabs>
|
||
<TabItem value="CLI" label="CLI" default>
|
||
|
||
Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding DataHub Cloud release version is `v0.2.16.5`
|
||
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):
|
||
|
||
- Run `datahub init` to update the instance you want to load into.
|
||
- Set the server to your sandbox instance, `https://{your-instance-address}/gms`.
|
||
- Set the token to your access token.
|
||
|
||
|
||
</TabItem>
|
||
<TabItem value="OpenAPI" label="OpenAPI">
|
||
|
||
Requirements for OpenAPI are:
|
||
* curl
|
||
* jq
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
|
||
## Create Structured Properties
|
||
|
||
The following code will create a structured property `io.acryl.privacy.retentionTime`.
|
||
|
||
<Tabs>
|
||
<TabItem value="graphql" label="graphQL" default>
|
||
|
||
```graphql
|
||
mutation createStructuredProperty {
|
||
createStructuredProperty(
|
||
input: {
|
||
id: "retentionTime",
|
||
qualifiedName:"retentionTime",
|
||
displayName: "Retention Time",
|
||
description: "Retention Time is used to figure out how long to retain records in a dataset",
|
||
valueType: "urn:li:dataType:datahub.number",
|
||
allowedValues: [
|
||
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
|
||
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
|
||
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
|
||
],
|
||
cardinality: SINGLE,
|
||
entityTypes: ["urn:li:entityType:dataset", "urn:li:entityType:dataFlow"],
|
||
}
|
||
) {
|
||
urn
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
<TabItem value="CLI" label="CLI">
|
||
|
||
Create a yaml file representing the properties you’d like to load.
|
||
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
|
||
|
||
```yaml
|
||
- id: io.acryl.privacy.retentionTime
|
||
# - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided
|
||
qualified_name: io.acryl.privacy.retentionTime # required if urn is provided
|
||
type: number
|
||
cardinality: MULTIPLE
|
||
display_name: Retention Time
|
||
entity_types:
|
||
- dataset # or urn:li:entityType:datahub.dataset
|
||
- dataFlow
|
||
description: "Retention Time is used to figure out how long to retain records in a dataset"
|
||
allowed_values:
|
||
- value: 30
|
||
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
|
||
- value: 90
|
||
description: Use this for datasets that drive monthly reporting but contain pii
|
||
- value: 365
|
||
description: Use this for non-sensitive data that can be retained for longer
|
||
```
|
||
|
||
Use the CLI to create your properties:
|
||
```commandline
|
||
datahub properties upsert -f {properties_yaml}
|
||
```
|
||
|
||
If successful, you should see `Created structured property urn:li:structuredProperty:...`
|
||
|
||
</TabItem>
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"displayName": "Retention Time",
|
||
"cardinality": "MULTIPLE",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"allowedValues": [
|
||
{
|
||
"value": {"double": 30},
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
"value": {"double": 60},
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
"value": {"double": 365},
|
||
"description": "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
]
|
||
}' | jq
|
||
```
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"displayName": "Retention Time",
|
||
"cardinality": "MULTIPLE",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"allowedValues": [
|
||
{
|
||
"value": {
|
||
"double": 30
|
||
},
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 60
|
||
},
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 365
|
||
},
|
||
"description": "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"propertyDefinition": {
|
||
"value": {
|
||
"allowedValues": [
|
||
{
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
|
||
"value": {
|
||
"double": 30
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii",
|
||
"value": {
|
||
"double": 60
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for non-sensitive data that can be retained for longer",
|
||
"value": {
|
||
"double": 365
|
||
}
|
||
}
|
||
],
|
||
"displayName": "Retention Time",
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"cardinality": "MULTIPLE"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
## Read Structured Properties
|
||
|
||
You can see the properties you created by running the following command:
|
||
|
||
<Tabs>
|
||
<TabItem value="CLI" label="CLI" default>
|
||
|
||
|
||
```commandline
|
||
datahub properties get --urn {urn}
|
||
```
|
||
For example, you can run `datahub properties get --urn urn:li:structuredProperty:io.acryl.privacy.retentionTime`.
|
||
If successful, you should see metadata about your properties returned.
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"qualified_name": "io.acryl.privacy.retentionTime",
|
||
"type": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"display_name": "Retention Time",
|
||
"entity_types": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"cardinality": "MULTIPLE",
|
||
"allowed_values": [
|
||
{
|
||
"value": "30",
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
"value": "90",
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
"value": "365",
|
||
"description": "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
Example Request:
|
||
```
|
||
curl -X 'GET' -v \
|
||
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
|
||
-H 'accept: application/json' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"value": {
|
||
"allowedValues": [
|
||
{
|
||
"value": {
|
||
"double": 30.0
|
||
},
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 60.0
|
||
},
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 365.0
|
||
},
|
||
"description": "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
],
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"displayName": "Retention Time",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"cardinality": "MULTIPLE"
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
Example Request:
|
||
```
|
||
curl -X 'GET' -v \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
|
||
-H 'accept: application/json' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"propertyDefinition": {
|
||
"value": {
|
||
"allowedValues": [
|
||
{
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
|
||
"value": {
|
||
"double": 30
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii",
|
||
"value": {
|
||
"double": 60
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for non-sensitive data that can be retained for longer",
|
||
"value": {
|
||
"double": 365
|
||
}
|
||
}
|
||
],
|
||
"displayName": "Retention Time",
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"cardinality": "MULTIPLE"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
|
||
## Set Structured Property To a Dataset
|
||
|
||
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.
|
||
|
||
<Tabs>
|
||
<TabItem value="graphQL" label="GraphQL" default>
|
||
|
||
```graphql
|
||
mutation upsertStructuredProperties {
|
||
upsertStructuredProperties(
|
||
input: {
|
||
assetUrn: "urn:li:mydataset1",
|
||
structuredPropertyInputParams: [
|
||
{
|
||
structuredPropertyUrn: "urn:li:structuredProperty:mystructuredproperty",
|
||
values: [
|
||
{
|
||
stringValue: "123"
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
) {
|
||
properties {
|
||
structuredProperty {
|
||
urn
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
```
|
||
|
||
</TabItem>
|
||
<TabItem value="CLI" label="CLI">
|
||
|
||
You can set structured properties to a dataset by creating a dataset yaml file with structured properties. For example, below is a dataset yaml file with structured properties in both the field and dataset level.
|
||
|
||
Please refer to the [full example here.](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/datasets.yaml)
|
||
|
||
```yaml
|
||
- id: user_clicks_snowflake
|
||
platform: snowflake
|
||
schema:
|
||
fields:
|
||
- id: user_id
|
||
structured_properties:
|
||
io.acryl.dataManagement.deprecationDate: "2023-01-01"
|
||
structured_properties:
|
||
io.acryl.dataManagement.replicationSLA: 90
|
||
```
|
||
|
||
Use the CLI to upsert your dataset yaml file:
|
||
```commandline
|
||
datahub dataset upsert -f {dataset_yaml}
|
||
```
|
||
If successful, you should see `Update succeeded for urn:li:dataset:...`
|
||
|
||
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
Following command will set structured properties `retentionTime` as `60.0` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
|
||
Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`)
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{"double": 60.0}
|
||
]
|
||
}
|
||
]
|
||
}' | jq
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
Following command will set structured properties `retentionTime` as `60.0` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`.
|
||
Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`)
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{"double": 60.0}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"double": 60
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
#### Expected Outcomes
|
||
|
||
Once your datasets are uploaded, you can view them in the UI and view the properties associated with them under the Properties tab.
|
||
|
||
<p align="center">
|
||
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-set.png"/>
|
||
</p>
|
||
|
||
Or you can run the following command to view the properties associated with the dataset:
|
||
|
||
```commandline
|
||
datahub dataset get --urn {urn}
|
||
```
|
||
|
||
## Read Structured Properties From a Dataset
|
||
|
||
For reading all structured properties from a dataset:
|
||
|
||
<Tabs>
|
||
<TabItem value="graphql" label="GraphQL" default>
|
||
|
||
```graphql
|
||
query getDataset {
|
||
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.ecommerce.customer,PROD)") {
|
||
structuredProperties {
|
||
properties {
|
||
structuredProperty {
|
||
urn
|
||
type
|
||
definition {
|
||
displayName
|
||
description
|
||
allowedValues {
|
||
description
|
||
}
|
||
}
|
||
}
|
||
values {
|
||
... on StringValue {
|
||
stringValue
|
||
}
|
||
... on NumberValue {
|
||
numberValue
|
||
}
|
||
}
|
||
valueEntities {
|
||
urn
|
||
type
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
## Remove Structured Properties From a Dataset
|
||
|
||
For removing a structured property or list of structured properties from a dataset:
|
||
|
||
<Tabs>
|
||
<TabItem value="graphql" label="GraphQL" default>
|
||
|
||
```graphql
|
||
mutation removeStructuredProperties {
|
||
removeStructuredProperties(
|
||
input: {
|
||
assetUrn: "urn:li:mydataset1",
|
||
structuredPropertyUrns: ["urn:li:structuredProperty:mystructuredproperty"]
|
||
}
|
||
) {
|
||
properties {
|
||
structuredProperty {urn}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
## Patch Structured Property Value
|
||
|
||
This section will show you how to patch a structured property value - either by removing, adding, or upserting a single property.
|
||
|
||
### Add Structured Property Value
|
||
|
||
For this example, we'll extend create a second structured property and apply both properties to the same dataset used previously.
|
||
After this your system should include both `io.acryl.privacy.retentionTime` and `io.acryl.privacy.retentionTime02`.
|
||
|
||
<Tabs>
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
Let's start by creating the second structured property.
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"qualifiedName": "io.acryl.privacy.retentionTime02",
|
||
"displayName": "Retention Time 02",
|
||
"valueType": "urn:li:dataType:datahub.string",
|
||
"allowedValues": [
|
||
{
|
||
"value": {"string": "foo2"},
|
||
"description": "test foo2 value"
|
||
},
|
||
{
|
||
"value": {"string": "bar2"},
|
||
"description": "test bar2 value"
|
||
}
|
||
],
|
||
"cardinality": "SINGLE",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset"
|
||
]
|
||
}' | jq
|
||
|
||
```
|
||
|
||
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`
|
||
Specifically, this will set `io.acryl.privacy.retentionTime` as `60.0` and `io.acryl.privacy.retentionTime02` as `bar2`.
|
||
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{"double": 60.0}
|
||
]
|
||
},
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"values": [
|
||
{"string": "bar2"}
|
||
]
|
||
}
|
||
]
|
||
}' | jq
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
Let's start by creating the second structured property.
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"qualifiedName": "io.acryl.privacy.retentionTime02",
|
||
"displayName": "Retention Time 02",
|
||
"valueType": "urn:li:dataType:datahub.string",
|
||
"allowedValues": [
|
||
{
|
||
"value": {"string": "foo2"},
|
||
"description": "test foo2 value"
|
||
},
|
||
{
|
||
"value": {"string": "bar2"},
|
||
"description": "test bar2 value"
|
||
}
|
||
],
|
||
"cardinality": "SINGLE",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset"
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"propertyDefinition": {
|
||
"value": {
|
||
"allowedValues": [
|
||
{
|
||
"value": {
|
||
"string": "foo2"
|
||
},
|
||
"description": "test foo2 value"
|
||
},
|
||
{
|
||
"value": {
|
||
"string": "bar2"
|
||
},
|
||
"description": "test bar2 value"
|
||
}
|
||
],
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset"
|
||
],
|
||
"qualifiedName": "io.acryl.privacy.retentionTime02",
|
||
"displayName": "Retention Time 02",
|
||
"cardinality": "SINGLE",
|
||
"valueType": "urn:li:dataType:datahub.string"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`
|
||
Specically, this will set `io.acryl.privacy.retentionTime` as `60.0` and `io.acryl.privacy.retentionTime02` as `bar2`.
|
||
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties?createIfNotExists=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{"double": 60.0}
|
||
]
|
||
},
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"values": [
|
||
{"string": "bar2"}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"double": 60
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
},
|
||
{
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
#### Expected Outcomes
|
||
You can see that the dataset now has two structured properties attached to it.
|
||
|
||
<p align="center">
|
||
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-add.png"/>
|
||
</p>
|
||
|
||
|
||
|
||
### Remove Structured Property Value
|
||
|
||
The expected state of our test dataset include 2 structured properties.
|
||
We'd like to remove the first one (`io.acryl.privacy.retentionTime`) and preserve the second property. (`io.acryl.privacy.retentionTime02`).
|
||
|
||
<Tabs>
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
```shell
|
||
curl -X 'PATCH' -v \
|
||
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json-patch+json' \
|
||
-d '{
|
||
"patch": [
|
||
{
|
||
"op": "remove",
|
||
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
}
|
||
],
|
||
"arrayPrimaryKeys": {
|
||
"properties": [
|
||
"propertyUrn"
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
The response will show that the expected property has been removed.
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"aspects": {
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
```shell
|
||
curl -X 'PATCH' -v \
|
||
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json-patch+json' \
|
||
-d '{
|
||
"patch": [
|
||
{
|
||
"op": "remove",
|
||
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
}
|
||
],
|
||
"arrayPrimaryKeys": {
|
||
"properties": [
|
||
"propertyUrn"
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
The response will show that the expected property has been removed.
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
</TabItem>
|
||
|
||
</Tabs>
|
||
|
||
#### Expected Outcomes
|
||
You can see that the first property has been removed and the second property is still present.
|
||
|
||
<p align="center">
|
||
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-remove.png"/>
|
||
</p>
|
||
|
||
|
||
|
||
### Upsert Structured Property Value
|
||
|
||
In this example, we'll add the property back with a different value, preserving the existing property.
|
||
|
||
<Tabs>
|
||
<TabItem value="graphql" label="GraphQL">
|
||
|
||
```graphql
|
||
mutation updateStructuredProperty {
|
||
updateStructuredProperty(
|
||
input: {
|
||
urn: "urn:li:structuredProperty:retentionTime",
|
||
displayName: "Retention Time",
|
||
description: "Retention Time is used to figure out how long to retain records in a dataset",
|
||
newAllowedValues: [
|
||
{
|
||
numberValue: 30,
|
||
description: "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
numberValue: 90,
|
||
description: "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
numberValue: 365,
|
||
description: "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
]
|
||
}
|
||
) {
|
||
urn
|
||
}
|
||
}
|
||
|
||
```
|
||
|
||
</TabItem>
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2">
|
||
|
||
```shell
|
||
curl -X 'PATCH' -v \
|
||
'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json-patch+json' \
|
||
-d '{
|
||
"patch": [
|
||
{
|
||
"op": "add",
|
||
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"value": {
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{
|
||
"double": 365.0
|
||
}
|
||
]
|
||
}
|
||
}
|
||
],
|
||
"arrayPrimaryKeys": {
|
||
"properties": [
|
||
"propertyUrn"
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"aspects": {
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
|
||
},
|
||
{
|
||
"values": [
|
||
{
|
||
"double": 365.0
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
The response shows that the property was re-added with the new value 365.0 instead of the previous value 60.0.
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
```shell
|
||
curl -X 'PATCH' -v \
|
||
'http://localhost:8080/openapi/v3/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json-patch+json' \
|
||
-d '{
|
||
"patch": [
|
||
{
|
||
"op": "add",
|
||
"path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"value": {
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{
|
||
"double": 365.0
|
||
}
|
||
]
|
||
}
|
||
}
|
||
],
|
||
"arrayPrimaryKeys": {
|
||
"properties": [
|
||
"propertyUrn"
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02"
|
||
},
|
||
{
|
||
"values": [
|
||
{
|
||
"double": 365
|
||
}
|
||
],
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
The response shows that the property was re-added with the new value 365 instead of the previous value 60.
|
||
|
||
</TabItem>
|
||
|
||
</Tabs>
|
||
|
||
#### Expected Outcomes
|
||
You can see that the first property has been added back with a new value and the second property is still present.
|
||
|
||
<p align="center">
|
||
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/apis/tutorials/sp-upsert.png"/>
|
||
</p>
|
||
|
||
|
||
## Delete Structured Properties
|
||
|
||
There are two types of deletion present in DataHub: hard and soft delete.
|
||
|
||
:::note SOFT DELETE
|
||
A soft deleted Structured Property does not remove any underlying data on the Structured Property entity or the Structured Property's values written to other entities.
|
||
The soft delete is 100% reversible with zero data loss. When a Structured Property is soft deleted, a few operations are not available.
|
||
|
||
Structured Property Soft Delete Effects:
|
||
|
||
- Entities with a soft deleted Structured Property value will not return the soft deleted properties
|
||
- Updates to a soft deleted Structured Property's definition are denied
|
||
- Adding a soft deleted Structured Property's value to an entity is denied
|
||
- Search filters using a soft deleted Structured Property will be denied
|
||
:::
|
||
|
||
:::note HARD DELETE
|
||
A hard deleted Structured Property REMOVES all underlying data for the Structured Property entity and the Structured Property's values written to other entities.
|
||
The hard delete is NOT reversible.
|
||
|
||
Structured Property Hard Delete Effects:
|
||
|
||
- Structured Property entity is removed
|
||
- Structured Property values are removed via PATCH MCPs on their respective entities
|
||
- Rollback is not possible
|
||
- Elasticsearch index mappings will continue to contain references to the hard deleted property until reindex
|
||
:::
|
||
|
||
### Soft Delete
|
||
|
||
<Tabs>
|
||
<TabItem value="CLI" label="CLI (Soft Delete)" default>
|
||
|
||
The following command will soft delete the test property.
|
||
|
||
```commandline
|
||
datahub delete --urn {urn}
|
||
```
|
||
|
||
</TabItem>
|
||
<TabItem value="OpenAPI v2" label="OpenAPI v2 (Soft Delete)">
|
||
|
||
The following command will soft delete the test property by writing to the status aspect.
|
||
|
||
```shell
|
||
curl -X 'POST' \
|
||
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"removed": true
|
||
}' | jq
|
||
```
|
||
|
||
If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below.
|
||
|
||
```shell
|
||
curl -X 'POST' \
|
||
'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"removed": false
|
||
}' | jq
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3 (Soft Delete)">
|
||
|
||
The following command will soft delete the test property by writing to the status aspect.
|
||
|
||
```shell
|
||
curl -X 'POST' \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"removed": true
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"status": {
|
||
"value": {
|
||
"removed": true
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below.
|
||
|
||
```shell
|
||
curl -X 'POST' \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false&createIfNotExists=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"removed": true
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"status": {
|
||
"value": {
|
||
"removed": false
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
</Tabs>
|
||
|
||
### Hard Delete
|
||
|
||
<Tabs>
|
||
<TabItem value="CLI" label="CLI (Hard Delete)">
|
||
|
||
The following command will hard delete the test property.
|
||
|
||
```commandline
|
||
datahub delete --urn {urn} --hard
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3 (Hard Delete)">
|
||
|
||
The following command will hard delete the test property.
|
||
|
||
```shell
|
||
curl -v -X 'DELETE' \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime'
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```text
|
||
> DELETE /openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime HTTP/1.1
|
||
> Host: localhost:8080
|
||
> User-Agent: curl/8.4.0
|
||
> Accept: */*
|
||
>
|
||
< HTTP/1.1 200 OK
|
||
< Date: Fri, 14 Jun 2024 17:30:27 GMT
|
||
< Content-Length: 0
|
||
< Server: Jetty(11.0.19)
|
||
```
|
||
</TabItem>
|
||
|
||
</Tabs>
|
||
|
||
#### Index Mappings Cleanup
|
||
|
||
After the asynchronous delete of all Structured Property values have been processed, triggered by the above
|
||
hard delete, it is possible to remove the remaining index mappings. Note that if even 1 Structured Property value remains
|
||
the mapping will not be removed for a given entity index.
|
||
|
||
Run the DataHub system-update job (automatically run with every helm upgrade or install and quickstart) with
|
||
the following environment variables enabled.
|
||
|
||
This will trigger an ES index which will take time to complete. During the process the entire index is recreated.
|
||
|
||
```shell
|
||
ELASTICSEARCH_INDEX_BUILDER_MAPPINGS_REINDEX=true
|
||
ENABLE_STRUCTURED_PROPERTIES_SYSTEM_UPDATE=true
|
||
```
|
||
|
||
## Update Structured Property With Breaking Schema Changes
|
||
|
||
This section will demonstrate how to make backwards incompatible schema changes. Making backwards incompatible
|
||
schema changes will remove previously written data.
|
||
|
||
Breaking schema changes are implemented by setting a version string within the Structured Property definition. This
|
||
version must be in the following format: `yyyyMMddhhmmss`, i.e. `20240614080000`
|
||
|
||
:::note IMPORTANT NOTES
|
||
Old values will not be retrieve-able after the new Structured Property definition is applied.
|
||
|
||
The old values will be subject to deletion asynchronously (future work).
|
||
:::
|
||
|
||
In the following example, we'll revisit the `retentionTime` structured property and apply a breaking change
|
||
by changing the cardinality from `MULTIPLE` to `SINGLE`. Normally this change would be rejected as a
|
||
backwards incompatible change since values that were previously written may have multiple values written
|
||
which would no longer be valid.
|
||
|
||
<Tabs>
|
||
<TabItem value="CLI" label="CLI" default>
|
||
|
||
Edit the previously created definition yaml: Change the cardinality to `SINGLE` and add a `version`.
|
||
|
||
```yaml
|
||
- id: io.acryl.privacy.retentionTime
|
||
# - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided
|
||
qualified_name: io.acryl.privacy.retentionTime # required if urn is provided
|
||
type: number
|
||
cardinality: SINGLE
|
||
version: '20240614080000'
|
||
display_name: Retention Time
|
||
entity_types:
|
||
- dataset # or urn:li:entityType:datahub.dataset
|
||
- dataFlow
|
||
description: "Retention Time is used to figure out how long to retain records in a dataset"
|
||
allowed_values:
|
||
- value: 30
|
||
description: 30 days, usually reserved for datasets that are ephemeral and contain pii
|
||
- value: 90
|
||
description: Use this for datasets that drive monthly reporting but contain pii
|
||
- value: 365
|
||
description: Use this for non-sensitive data that can be retained for longer
|
||
```
|
||
|
||
Use the CLI to create your properties:
|
||
```commandline
|
||
datahub properties upsert -f {properties_yaml}
|
||
```
|
||
|
||
If successful, you should see `Created structured property urn:li:structuredProperty:...`
|
||
|
||
</TabItem>
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
Change the cardinality to `SINGLE` and add a `version`.
|
||
|
||
```shell
|
||
curl -X 'POST' -v \
|
||
'http://localhost:8080/openapi/v3/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition?createIfNotExists=false' \
|
||
-H 'accept: application/json' \
|
||
-H 'Content-Type: application/json' \
|
||
-d '{
|
||
"value": {
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"displayName": "Retention Time",
|
||
"cardinality": "SINGLE",
|
||
"version": "20240614080000",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"allowedValues": [
|
||
{
|
||
"value": {
|
||
"double": 30
|
||
},
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 60
|
||
},
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii"
|
||
},
|
||
{
|
||
"value": {
|
||
"double": 365
|
||
},
|
||
"description": "Use this for non-sensitive data that can be retained for longer"
|
||
}
|
||
]
|
||
}
|
||
}' | jq
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"propertyDefinition": {
|
||
"value": {
|
||
"allowedValues": [
|
||
{
|
||
"description": "30 days, usually reserved for datasets that are ephemeral and contain pii",
|
||
"value": {
|
||
"double": 30
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for datasets that drive monthly reporting but contain pii",
|
||
"value": {
|
||
"double": 60
|
||
}
|
||
},
|
||
{
|
||
"description": "Use this for non-sensitive data that can be retained for longer",
|
||
"value": {
|
||
"double": 365
|
||
}
|
||
}
|
||
],
|
||
"displayName": "Retention Time",
|
||
"qualifiedName": "io.acryl.privacy.retentionTime",
|
||
"valueType": "urn:li:dataType:datahub.number",
|
||
"description": "Retention Time is used to figure out how long to retain records in a dataset",
|
||
"entityTypes": [
|
||
"urn:li:entityType:datahub.dataset",
|
||
"urn:li:entityType:datahub.dataFlow"
|
||
],
|
||
"version": "20240614080000",
|
||
"cardinality": "SINGLE"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
## Structured Properties - Search & Aggregation
|
||
|
||
Currently Structured Properties can be used to filter search results. This currently excludes fulltext search.
|
||
|
||
The following examples re-use the two previously defined Structured Properties.
|
||
|
||
`io.acryl.privacy.retentionTime` - An example numeric property.
|
||
|
||
`io.acryl.privacy.retentionTime02` - An example string property.
|
||
|
||
<Tabs>
|
||
<TabItem value="GraphQL" label="GraphQL" default>
|
||
|
||
Range Query:
|
||
|
||
Document should be returned based on the previously assigned value of 60.
|
||
|
||
```graphql
|
||
query {
|
||
scrollAcrossEntities(
|
||
input: {
|
||
types: DATASET,
|
||
count: 10,
|
||
query: "*",
|
||
orFilters: {
|
||
and: [
|
||
{
|
||
field: "structuredProperties.io.acryl.privacy.retentionTime",
|
||
condition: GREATER_THAN,
|
||
values: [
|
||
"45.0"
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
urn,
|
||
type
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Exists Query:
|
||
|
||
Document should be returned based on the previously assigned value.
|
||
|
||
```graphql
|
||
query {
|
||
scrollAcrossEntities(
|
||
input: {
|
||
types: DATASET,
|
||
count: 10,
|
||
query: "*",
|
||
orFilters: {
|
||
and: [
|
||
{
|
||
field: "structuredProperties.io.acryl.privacy.retentionTime",
|
||
condition: EXISTS
|
||
}
|
||
]
|
||
}
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
urn,
|
||
type
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Equality Query:
|
||
|
||
Document should be returned based on the previously assigned value of 'bar2'.
|
||
|
||
```graphql
|
||
query {
|
||
scrollAcrossEntities(
|
||
input: {
|
||
types: DATASET,
|
||
count: 10,
|
||
query: "*",
|
||
orFilters: {
|
||
and: [
|
||
{
|
||
field: "structuredProperties.io.acryl.privacy.retentionTime02",
|
||
condition: EQUAL
|
||
values: [
|
||
"bar2"
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
) {
|
||
searchResults {
|
||
entity {
|
||
urn,
|
||
type
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
|
||
<TabItem value="OpenAPI v3" label="OpenAPI v3">
|
||
|
||
Unlike GraphQL which has a parsed input object for filtering, OpenAPI only includes a structured query which
|
||
relies on the `query_string` syntax. See the Elasticsearch [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-query-string-query.html) for detailed syntax.
|
||
|
||
In order to use the `query_string` syntax we'll need to know a bit about the Structured Property's definition such
|
||
as whether it is versioned or un-unversioned and its type. This information will be added to the `query` url parameter.
|
||
|
||
Un-versioned Example:
|
||
|
||
Structured Property URN - `urn:li:structuredProperty:io.acryl.privacy.retentionTime`
|
||
|
||
Elasticsearch Field Name - `structuredProperties.io_acryl_privacy_retentionTime`
|
||
|
||
Versioned:
|
||
|
||
Structured Property Version - `20240614080000`
|
||
|
||
Structured Property Type - `string`
|
||
|
||
Structured Property URN - `urn:li:structuredProperty:io.acryl.privacy.retentionTime02`
|
||
|
||
Elasticsearch Field Name - `structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string`
|
||
|
||
Range Query:
|
||
|
||
query - `structuredProperties.io_acryl_privacy_retentionTime:>45`
|
||
|
||
```shell
|
||
curl -X 'GET' \
|
||
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=structuredProperties.io_acryl_privacy_retentionTime%3A%3E45' \
|
||
-H 'accept: application/json'
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"entities": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"datasetKey": {
|
||
"value": {
|
||
"name": "SampleHiveDataset",
|
||
"platform": "urn:li:dataPlatform:hive",
|
||
"origin": "PROD"
|
||
}
|
||
},
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{
|
||
"double": 60
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Exists Query:
|
||
|
||
query - `_exists_:structuredProperties.io_acryl_privacy_retentionTime`
|
||
|
||
```shell
|
||
curl -X 'GET' \
|
||
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=_exists_%3AstructuredProperties.io_acryl_privacy_retentionTime' \
|
||
-H 'accept: application/json'
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"entities": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"datasetKey": {
|
||
"value": {
|
||
"name": "SampleHiveDataset",
|
||
"platform": "urn:li:dataPlatform:hive",
|
||
"origin": "PROD"
|
||
}
|
||
},
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{
|
||
"double": 60
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
Equality Query:
|
||
|
||
query - `structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string`
|
||
|
||
```shell
|
||
curl -X 'GET' \
|
||
'http://localhost:9002/openapi/v3/entity/dataset?systemMetadata=false&aspects=datasetKey&aspects=structuredProperties&count=10&sort=urn&sortOrder=ASCENDING&query=structuredProperties._versioned.io_acryl_privacy_retentionTime02.20240614080000.string' \
|
||
-H 'accept: application/json'
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"entities": [
|
||
{
|
||
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)",
|
||
"datasetKey": {
|
||
"value": {
|
||
"name": "SampleHiveDataset",
|
||
"platform": "urn:li:dataPlatform:hive",
|
||
"origin": "PROD"
|
||
}
|
||
},
|
||
"structuredProperties": {
|
||
"value": {
|
||
"properties": [
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime",
|
||
"values": [
|
||
{
|
||
"double": 60
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02",
|
||
"values": [
|
||
{
|
||
"string": "bar2"
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|
||
|
||
### Structured Property Aggregations
|
||
|
||
Structured properties can also be used in GraphQL's aggregation queries using the same naming convention outlined above
|
||
for search filter field names. There are currently no aggregation endpoints for OpenAPI.
|
||
|
||
<Tabs>
|
||
<TabItem value="GraphQL" label="GraphQL" default>
|
||
|
||
Aggregation Query:
|
||
|
||
```graphql
|
||
query {
|
||
aggregateAcrossEntities(
|
||
input: {
|
||
types: [],
|
||
facets: [
|
||
"structuredProperties.io.acryl.privacy.retentionTime02",
|
||
"structuredProperties.io.acryl.privacy.retentionTime"],
|
||
query: "*",
|
||
orFilters: [],
|
||
searchFlags: {maxAggValues: 100}
|
||
}) {
|
||
facets {
|
||
field
|
||
aggregations {
|
||
value
|
||
count
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Example Response:
|
||
|
||
```json
|
||
{
|
||
"data": {
|
||
"aggregateAcrossEntities": {
|
||
"facets": [
|
||
{
|
||
"field": "structuredProperties.io.acryl.privacy.retentionTime02",
|
||
"aggregations": [
|
||
{
|
||
"value": "bar2",
|
||
"count": 1
|
||
}
|
||
]
|
||
},
|
||
{
|
||
"field": "structuredProperties.io.acryl.privacy.retentionTime",
|
||
"aggregations": [
|
||
{
|
||
"value": "60.0",
|
||
"count": 1
|
||
}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
},
|
||
"extensions": {}
|
||
}
|
||
```
|
||
|
||
</TabItem>
|
||
</Tabs>
|