2023-03-11 17:25:50 -08:00
# DataHub OpenAPI Guide
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
## Why OpenAPI
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
The OpenAPI standard is a widely used documentation and design approach for REST-ful APIs.
To make it easier to integrate with DataHub, we are publishing an OpenAPI based set of endpoints.
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
Read [the DataHub API overview ](../datahub-apis.md ) to understand the rationale behind the different API-s and when to use each one.
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
## Locating the OpenAPI endpoints
2022-05-03 19:38:05 -05:00
Currently, the OpenAPI endpoints are isolated to a servlet on GMS and are automatically deployed with a GMS server.
2023-03-11 17:25:50 -08:00
The servlet includes auto-generation of an OpenAPI UI, also known as Swagger, which is available at **GMS_SERVER_HOST:GMS_PORT/openapi/swagger-ui/index.html** . For example, the Quickstart running locally exposes this at http://localhost:8080/openapi/swagger-ui/index.html.
This is also exposed through DataHub frontend as a proxy with the same endpoint, but GMS host and port replaced with DataHub frontend's url ([Local Quickstart link ](http://localhost:9002/openapi/swagger-ui/index.html )) and is available in the top right dropdown under the user profile picture as a link.

Note that it is possible to get the raw JSON or YAML formats of the OpenAPI spec by navigating to [**BASE_URL/openapi/v3/api-docs** ](http://localhost:9002/openapi/v3/api-docs ) or [**BASE_URL/openapi/v3/api-docs.yaml** ](http://localhost:9002/openapi/v3/api-docs.yaml ).
The raw forms can be fed into codegen systems to generate client side code in the language of your choice that support the OpenAPI format. We have noticed varying degrees of maturity with different languages in these codegen systems so some may require customizations to be fully compatible.
2022-05-03 19:38:05 -05:00
The OpenAPI UI includes explorable schemas for request and response objects that are fully documented. The models used
in the OpenAPI UI are all autogenerated at build time from the PDL models to JSON Schema compatible Java Models.
2023-03-11 17:25:50 -08:00
## Understanding the OpenAPI endpoints
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
While the full OpenAPI spec is always available at [**GMS_SERVER_HOST:GMS_PORT/openapi/swagger-ui/index.html** ](http://localhost:8080/openapi/swagger-ui/index.html ), here's a quick overview of the main OpenAPI endpoints and their purpose.
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
### Entities (/entities)
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
The entities endpoints are intended for reads and writes to the metadata graph. The entire DataHub metadata model is available for you to write to (as entity, aspect pairs) or to read an individual entity's metadata from. See [examples ](#entities-entities-endpoint ) below.
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
### Relationships (/relationships)
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
The relationships endpoints are intended for you to query the graph, to navigate relationships from one entity to others. See [examples ](#relationships-relationships-endpoint ) below.
### Timeline (/timeline)
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
The timeline endpoints are intended for querying the versioned history of a given entity over time. For example, you can query a dataset for all schema changes that have happened to it over time, or all documentation changes that have happened to it. See [this ](../../dev-guides/timeline.md ) guide for more details.
2022-05-03 19:38:05 -05:00
2023-03-11 17:25:50 -08:00
### Platform (/platform)
Even lower-level API-s that allow you to write metadata events into the DataHub platform using a standard format.
### Example Requests
#### Entities (/entities) endpoint
2022-05-03 19:38:05 -05:00
2024-03-23 06:15:36 -05:00
##### POST (UPSERT)
A post without any additional URL parameters performs an UPSERT of entity's aspects. The entity will be
created if it doesn't exist or updated if it does.
2022-05-03 19:38:05 -05:00
```shell
curl --location --request POST 'localhost:8080/openapi/entities/v1/' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
2024-03-23 06:15:36 -05:00
--header 'Authorization: Bearer < token > ' \
2022-05-03 19:38:05 -05:00
--data-raw '[
{
"aspect": {
"__type": "SchemaMetadata",
"schemaName": "SampleHdfsSchema",
"platform": "urn:li:dataPlatform:platform",
"platformSchema": {
"__type": "MySqlDDL",
"tableSchema": "schema"
},
"version": 0,
"created": {
"time": 1621882982738,
"actor": "urn:li:corpuser:etl",
"impersonator": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 1621882982738,
"actor": "urn:li:corpuser:etl",
"impersonator": "urn:li:corpuser:jdoe"
},
"hash": "",
"fields": [
{
"fieldPath": "county_fips_codefg",
"jsonPath": "null",
"nullable": true,
"description": "null",
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "String()",
"recursive": false
},
{
"fieldPath": "county_name",
"jsonPath": "null",
"nullable": true,
"description": "null",
"type": {
"type": {
"__type": "StringType"
}
},
"nativeDataType": "String()",
"recursive": false
}
]
},
"entityType": "dataset",
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
}
]'
```
2025-04-16 16:55:51 -07:00
2024-03-23 06:15:36 -05:00
##### POST (CREATE)
The second POST example will write the update ONLY if the entity doesn't exist. If the entity does exist the
command will return an error instead of overwriting the entity.
2024-07-11 10:52:58 -05:00
In this example we've added a URL parameter `createEntityIfNotExists=true`
2024-03-23 06:15:36 -05:00
```shell
curl --location --request POST 'localhost:8080/openapi/entities/v1/?createEntityIfNotExists=true' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer < token > ' \
--data-raw '< see previous example > '
```
If the entity doesn't exist the response will be identical to the previous example. In the case where the entity already exists,
the following error will occur.
> 422 ValidationExceptionCollection{EntityAspect:(urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD),schemaMetadata) Exceptions: [com.linkedin.metadata.aspect.plugins.validation.AspectValidationException: Cannot perform CREATE if not exists since the entity key already exists.]}
2022-05-03 19:38:05 -05:00
##### GET
```shell
curl --location --request GET 'localhost:8080/openapi/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& aspectNames=schemaMetadata' \
--header 'Accept: application/json' \
2024-03-23 06:15:36 -05:00
--header 'Authorization: Bearer < token > '
2022-05-03 19:38:05 -05:00
```
##### DELETE
```shell
curl --location --request DELETE 'localhost:8080/openapi/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& soft=true' \
--header 'Accept: application/json' \
2024-03-23 06:15:36 -05:00
--header 'Authorization: Bearer < token > '
2022-05-03 19:38:05 -05:00
```
#### Postman Collection
Collection includes a POST, GET, and DELETE for a single entity with a SchemaMetadata aspect
```json
{
"info": {
"_postman_id": "87b7401c-a5dc-47e4-90b4-90fe876d6c28",
"name": "DataHub OpenAPI",
"description": "A description",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "entities/v1",
"item": [
{
"name": "post Entities 1",
"request": {
"method": "POST",
"header": [
{
"key": "Content-Type",
"value": "application/json"
},
{
"key": "Accept",
"value": "application/json"
}
],
"body": {
"mode": "raw",
"raw": "[\n {\n \"aspect\": {\n \"__type\": \"SchemaMetadata\",\n \"schemaName\": \"SampleHdfsSchema\",\n \"platform\": \"urn:li:dataPlatform:platform\",\n \"platformSchema\": {\n \"__type\": \"MySqlDDL\",\n \"tableSchema\": \"schema\"\n },\n \"version\": 0,\n \"created\": {\n \"time\": 1621882982738,\n \"actor\": \"urn:li:corpuser:etl\",\n \"impersonator\": \"urn:li:corpuser:jdoe\"\n },\n \"lastModified\": {\n \"time\": 1621882982738,\n \"actor\": \"urn:li:corpuser:etl\",\n \"impersonator\": \"urn:li:corpuser:jdoe\"\n },\n \"hash\": \"\",\n \"fields\": [\n {\n \"fieldPath\": \"county_fips_codefg\",\n \"jsonPath\": \"null\",\n \"nullable\": true,\n \"description\": \"null\",\n \"type\": {\n \"type\": {\n \"__type\": \"StringType\"\n }\n },\n \"nativeDataType\": \"String()\",\n \"recursive\": false\n },\n {\n \"fieldPath\": \"county_name\",\n \"jsonPath\": \"null\",\n \"nullable\": true,\n \"description\": \"null\",\n \"type\": {\n \"type\": {\n \"__type\": \"StringType\"\n }\n },\n \"nativeDataType\": \"String()\",\n \"recursive\": false\n }\n ]\n },\n \"aspectName\": \"schemaMetadata\",\n \"entityType\": \"dataset\",\n \"entityUrn\": \"urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)\"\n }\n]",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["openapi", "entities", "v1", ""]
2022-05-03 19:38:05 -05:00
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "POST",
"header": [],
"body": {
"mode": "raw",
"raw": "[\n {\n \"aspect\": {\n \"value\": \"< Error: Too many levels of nesting to fake this schema > \"\n },\n \"aspectName\": \"aliquip ipsum tempor\",\n \"entityType\": \"ut est\",\n \"entityUrn\": \"enim in nulla\",\n \"entityKeyAspect\": {\n \"value\": \"< Error: Too many levels of nesting to fake this schema > \"\n }\n },\n {\n \"aspect\": {\n \"value\": \"< Error: Too many levels of nesting to fake this schema > \"\n },\n \"aspectName\": \"ipsum id\",\n \"entityType\": \"deser\",\n \"entityUrn\": \"aliqua sit\",\n \"entityKeyAspect\": {\n \"value\": \"< Error: Too many levels of nesting to fake this schema > \"\n }\n }\n]",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "{{baseUrl}}/entities/v1/",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["entities", "v1", ""]
2022-05-03 19:38:05 -05:00
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "[\n \"c\",\n \"labore dolor exercitation in\"\n]"
}
]
},
{
"name": "delete Entities",
"request": {
"method": "DELETE",
"header": [
{
"key": "Accept",
"value": "application/json"
}
],
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& soft=true",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["openapi", "entities", "v1", ""],
2022-05-03 19:38:05 -05:00
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request."
},
{
"key": "urns",
"value": "labore dolor exercitation in",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request.",
"disabled": true
},
{
"key": "soft",
"value": "true",
"description": "Determines whether the delete will be soft or hard, defaults to true for soft delete"
}
]
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "DELETE",
"header": [],
"url": {
"raw": "{{baseUrl}}/entities/v1/?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& soft=true",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["entities", "v1", ""],
2022-05-03 19:38:05 -05:00
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
},
{
"key": "urns",
"value": "officia occaecat elit dolor",
"disabled": true
},
{
"key": "soft",
"value": "true"
}
]
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "[\n {\n \"rowsRolledBack\": [\n {\n \"urn\": \"urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)\"\n }\n ],\n \"rowsDeletedFromEntityDeletion\": 1\n }\n]"
}
]
},
{
"name": "get Entities",
"protocolProfileBehavior": {
"disableUrlEncoding": false
},
"request": {
"method": "GET",
"header": [
{
"key": "Accept",
"value": "application/json"
}
],
"url": {
"raw": "{{baseUrl}}/openapi/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& aspectNames=schemaMetadata",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["openapi", "entities", "v1", "latest"],
2022-05-03 19:38:05 -05:00
"query": [
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request."
},
{
"key": "urns",
"value": "labore dolor exercitation in",
"description": "(Required) A list of raw urn strings, only supports a single entity type per request.",
"disabled": true
},
{
"key": "aspectNames",
"value": "schemaMetadata",
"description": "The list of aspect names to retrieve"
},
{
"key": "aspectNames",
"value": "labore dolor exercitation in",
"description": "The list of aspect names to retrieve",
"disabled": true
}
]
}
},
"response": [
{
"name": "OK",
"originalRequest": {
"method": "GET",
"header": [],
"url": {
"raw": "{{baseUrl}}/entities/v1/latest?urns=urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)& aspectNames=schemaMetadata",
2025-04-16 16:55:51 -07:00
"host": ["{{baseUrl}}"],
"path": ["entities", "v1", "latest"],
2022-05-03 19:38:05 -05:00
"query": [
{
"key": "urns",
"value": "non exercitation occaecat",
"disabled": true
},
{
"key": "urns",
"value": "urn:li:dataset:(urn:li:dataPlatform:platform,testSchemaIngest,PROD)"
},
{
"key": "aspectNames",
"value": "non exercitation occaecat",
"disabled": true
},
{
"key": "aspectNames",
"value": "schemaMetadata"
}
]
}
},
"status": "OK",
"code": 200,
"_postman_previewlanguage": "json",
"header": [
{
"key": "Content-Type",
"value": "application/json"
}
],
"cookie": [],
"body": "{\n \"responses\": {\n \"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)\": {\n \"entityName\": \"dataset\",\n \"urn\": \"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)\",\n \"aspects\": {\n \"datasetKey\": {\n \"name\": \"datasetKey\",\n \"type\": \"VERSIONED\",\n \"version\": 0,\n \"value\": {\n \"__type\": \"DatasetKey\",\n \"platform\": \"urn:li:dataPlatform:hive\",\n \"name\": \"SampleHiveDataset\",\n \"origin\": \"PROD\"\n },\n \"created\": {\n \"time\": 1650657843351,\n \"actor\": \"urn:li:corpuser:__datahub_system\"\n }\n },\n \"schemaMetadata\": {\n \"name\": \"schemaMetadata\",\n \"type\": \"VERSIONED\",\n \"version\": 0,\n \"value\": {\n \"__type\": \"SchemaMetadata\",\n \"schemaName\": \"SampleHiveSchema\",\n \"platform\": \"urn:li:dataPlatform:hive\",\n \"version\": 0,\n \"created\": {\n \"time\": 1581407189000,\n \"actor\": \"urn:li:corpuser:jdoe\"\n },\n \"lastModified\": {\n \"time\": 1581407189000,\n \"actor\": \"urn:li:corpuser:jdoe\"\n },\n \"hash\": \"\",\n \"platformSchema\": {\n \"__type\": \"KafkaSchema\",\n \"documentSchema\": \"{\\\"type\\\":\\\"record\\\",\\\"name\\\":\\\"SampleHiveSchema\\\",\\\"namespace\\\":\\\"com.linkedin.dataset\\\",\\\"doc\\\":\\\"Sample Hive dataset\\\",\\\"fields\\\":[{\\\"name\\\":\\\"field_foo\\\",\\\"type\\\":[\\\"string\\\"]},{\\\"name\\\":\\\"field_bar\\\",\\\"type\\\":[\\\"boolean\\\"]}]}\"\n },\n \"fields\": [\n {\n \"fieldPath\": \"field_foo\",\n \"nullable\": false,\n \"description\": \"Foo field description\",\n \"type\": {\n \"type\": {\n \"__type\": \"BooleanType\"\n }\n },\n \"nativeDataType\": \"varchar(100)\",\n \"recursive\": false,\n \"isPartOfKey\": true\n },\n {\n \"fieldPath\": \"field_bar\",\n \"nullable\": false,\n \"description\": \"Bar field description\",\n \"type\": {\n \"type\": {\n \"__type\": \"BooleanType\"\n }\n },\n \"nativeDataType\": \"boolean\",\n \"recursive\": false,\n \"isPartOfKey\": false\n }\n ]\n },\n \"created\": {\n \"time\": 1650610810000,\n \"actor\": \"urn:li:corpuser:UNKNOWN\"\n }\n }\n }\n }\n }\n}"
}
]
}
],
"auth": {
"type": "bearer",
"bearer": [
{
"key": "token",
"value": "{{token}}",
"type": "string"
}
]
},
"event": [
{
"listen": "prerequest",
"script": {
"type": "text/javascript",
2025-04-16 16:55:51 -07:00
"exec": [""]
2022-05-03 19:38:05 -05:00
}
},
{
"listen": "test",
"script": {
"type": "text/javascript",
2025-04-16 16:55:51 -07:00
"exec": [""]
2022-05-03 19:38:05 -05:00
}
}
]
}
],
"event": [
{
"listen": "prerequest",
"script": {
"type": "text/javascript",
2025-04-16 16:55:51 -07:00
"exec": [""]
2022-05-03 19:38:05 -05:00
}
},
{
"listen": "test",
"script": {
"type": "text/javascript",
2025-04-16 16:55:51 -07:00
"exec": [""]
2022-05-03 19:38:05 -05:00
}
}
],
"variable": [
{
"key": "baseUrl",
"value": "localhost:8080",
"type": "string"
},
{
"key": "token",
"value": "eyJhbGciOiJIUzI1NiJ9.eyJhY3RvclR5cGUiOiJVU0VSIiwiYWN0b3JJZCI6ImRhdGFodWIiLCJ0eXBlIjoiUEVSU09OQUwiLCJ2ZXJzaW9uIjoiMSIsImV4cCI6MTY1MDY2MDY1NSwianRpIjoiM2E4ZDY3ZTItOTM5Yi00NTY3LWE0MjYtZDdlMDA1ZGU3NjJjIiwic3ViIjoiZGF0YWh1YiIsImlzcyI6ImRhdGFodWItbWV0YWRhdGEtc2VydmljZSJ9.pp_vW2u1tiiTT7U0nDF2EQdcayOMB8jatiOA8Je4JJA",
"type": "default"
}
]
}
2023-03-11 17:25:50 -08:00
```
#### Relationships (/relationships) endpoint
##### GET
**Sample Request**
2025-04-16 16:55:51 -07:00
2023-03-11 17:25:50 -08:00
```shell
curl -X 'GET' \
'http://localhost:8080/openapi/relationships/v1/?urn=urn%3Ali%3Acorpuser%3Adatahub& relationshipTypes=IsPartOf& direction=INCOMING& start=0& count=200' \
-H 'accept: application/json'
```
**Sample Response**
2025-04-16 16:55:51 -07:00
2023-03-11 17:25:50 -08:00
```json
{
"start": 0,
"count": 2,
"total": 2,
"entities": [
{
"relationshipType": "IsPartOf",
"urn": "urn:li:corpGroup:bfoo"
},
{
"relationshipType": "IsPartOf",
"urn": "urn:li:corpGroup:jdoe"
}
]
}
```
## Programmatic Usage
Programmatic usage of the models can be done through the Java Rest Emitter which includes the generated models. A minimal
Java project for emitting to the OpenAPI endpoints would need the following dependencies (gradle format):
```groovy
dependencies {
implementation 'io.acryl:datahub-client:< DATAHUB_CLIENT_VERSION > '
implementation 'org.apache.httpcomponents:httpclient:< APACHE_HTTP_CLIENT_VERSION > '
implementation 'org.apache.httpcomponents:httpasyncclient:< APACHE_ASYNC_CLIENT_VERSION > '
}
```
### Writing metadata events to the /platform endpoints
The following code emits metadata events through OpenAPI by constructing a list of `UpsertAspectRequest` s. Behind the scenes, this is using the ** /platform/entities/v1** endpoint to send metadata to GMS.
```java
import io.datahubproject.openapi.generated.DatasetProperties;
import datahub.client.rest.RestEmitter;
import datahub.event.UpsertAspectRequest;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
public class Main {
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
RestEmitter emitter = RestEmitter.createWithDefaults();
List< UpsertAspectRequest > requests = new ArrayList< >();
UpsertAspectRequest upsertAspectRequest = UpsertAspectRequest.builder()
.entityType("dataset")
.entityUrn("urn:li:dataset:(urn:li:dataPlatform:bigquery,my-project.my-other-dataset.user-table,PROD)")
.aspect(new DatasetProperties().description("This is the canonical User profile dataset"))
.build();
UpsertAspectRequest upsertAspectRequest2 = UpsertAspectRequest.builder()
.entityType("dataset")
.entityUrn("urn:li:dataset:(urn:li:dataPlatform:bigquery,my-project.another-dataset.user-table,PROD)")
.aspect(new DatasetProperties().description("This is the canonical User profile dataset 2"))
.build();
requests.add(upsertAspectRequest);
requests.add(upsertAspectRequest2);
System.out.println(emitter.emit(requests, null).get());
System.exit(0);
}
}
```
2024-07-11 10:52:58 -05:00
## OpenAPI v3 Features
### Conditional Writes
All the create/POST endpoints for aspects support `headers` in the POST body to support batch APIs. See the docs in the
2024-07-12 11:35:11 -05:00
[MetadataChangeProposal ](../../advanced/mcp-mcl.md ) section for the use of these headers to support conditional writes semantics.
### Batch Get
Batch get endpoints in the form of `/v3/entity/{entityName}/batchGet` exist for all entities. This endpoint allows
fetching entity and aspects in bulk. In combination with the `If-Version-Match` header it can also retrieve
a specific version of the aspects, however it defaults to the latest aspect version. Currently, this interface is limited
to returning a single version for each entity/aspect however different versions can be specified across entities.
A few example queries are as follows:
Example Request:
2025-04-16 16:55:51 -07:00
Fetch the latest aspects for the given URNs with the url parameter `systemMetadata=true` in order to view the current
2024-07-12 11:35:11 -05:00
versions of the aspects.
```json
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
"globalTags": {},
"datasetProperties": {}
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"globalTags": {},
"datasetProperties": {}
}
]
```
Example Response:
Notice that `systemMetadata` contains `"version": "1"` for each of the aspects that exist in the system.
```json
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
"datasetProperties": {
"value": {
"description": "table containing all the users deleted on a single day",
"customProperties": {
"encoding": "utf-8"
},
"tags": []
},
"systemMetadata": {
"properties": {
"clientVersion": "1!0.0.0.dev0",
"clientId": "acryl-datahub"
},
"version": "1",
"lastObserved": 1720781548776,
"lastRunId": "file-2024_07_12-05_52_28",
"runId": "file-2024_07_12-05_52_28"
}
}
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"datasetProperties": {
"value": {
"description": "table containing all the users created on a single day",
"customProperties": {
"encoding": "utf-8"
},
"tags": []
},
"systemMetadata": {
"properties": {
"clientVersion": "1!0.0.0.dev0",
"clientId": "acryl-datahub"
},
"version": "1",
"lastObserved": 1720781548773,
"lastRunId": "file-2024_07_12-05_52_28",
"runId": "file-2024_07_12-05_52_28"
}
},
"globalTags": {
"value": {
"tags": [
{
"tag": "urn:li:tag:NeedsDocumentation"
}
]
},
"systemMetadata": {
"properties": {
"appSource": "ui"
},
"version": "1",
"lastObserved": 0,
"lastRunId": "no-run-id-provided",
"runId": "no-run-id-provided"
}
}
}
]
```
Next let's mutate `globalTags` for the second URN by adding a new tag. This will increment the version of
2025-04-16 16:55:51 -07:00
the `globalTags` aspect. The response will then look at like the following, notice the incremented
2024-07-12 11:35:11 -05:00
`"version": "2"` in `systemMetadata` for the `globalTags` aspect. Also notice that there are now 2 tags present, unlike
previously where only `urn:li:tag:NeedsDocumentation` was present.
```json
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
"datasetProperties": {
"value": {
"description": "table containing all the users deleted on a single day",
"customProperties": {
"encoding": "utf-8"
},
"tags": []
},
"systemMetadata": {
"properties": {
"clientVersion": "1!0.0.0.dev0",
"clientId": "acryl-datahub"
},
"version": "1",
"lastObserved": 1720781548776,
"lastRunId": "file-2024_07_12-05_52_28",
"runId": "file-2024_07_12-05_52_28"
}
}
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"datasetProperties": {
"value": {
"description": "table containing all the users created on a single day",
"customProperties": {
"encoding": "utf-8"
},
"tags": []
},
"systemMetadata": {
"properties": {
"clientVersion": "1!0.0.0.dev0",
"clientId": "acryl-datahub"
},
"version": "1",
"lastObserved": 1720781548773,
"lastRunId": "file-2024_07_12-05_52_28",
"runId": "file-2024_07_12-05_52_28"
}
},
"globalTags": {
"value": {
"tags": [
{
"tag": "urn:li:tag:NeedsDocumentation"
},
{
"tag": "urn:li:tag:Legacy"
}
]
},
"systemMetadata": {
"properties": {
"appSource": "ui"
},
"version": "2",
"lastObserved": 0,
"lastRunId": "no-run-id-provided",
"runId": "no-run-id-provided"
}
}
}
]
```
Next, we'll retrieve the previous version of the `globalTags` for the one aspect with a version 2 with the following query.
We can do this by populating the `headers` map with `If-Version-Match` to retrieve the previous version 1.
Example Request:
2025-04-16 16:55:51 -07:00
2024-07-12 11:35:11 -05:00
```json
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"globalTags": {
"headers": {
"If-Version-Match": "1"
}
}
}
]
```
Example Response:
The previous version `1` of the `globalTags` aspect is returned as expected with only the single tag.
```json
[
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"globalTags": {
"value": {
"tags": [
{
"tag": "urn:li:tag:NeedsDocumentation"
}
]
},
"systemMetadata": {
"properties": {
"appSource": "ui"
},
"version": "1",
"lastObserved": 0,
"lastRunId": "no-run-id-provided",
"runId": "no-run-id-provided"
}
}
}
]
```
2025-04-10 18:17:27 -05:00
### Generic Patching
The OpenAPI v3 PATCH endpoints offer advantages over previous patch support by removing the need for specific backend
2025-04-16 16:55:51 -07:00
code to handle patching, see [Template Classes ](/docs/advanced/patch.md#implementation-details ). This technique leverages
2025-04-10 18:17:27 -05:00
the natural JSON structure of aspects and extends a generic patching mechanism based on the JSON Patch standard (RFC 6902)
with significant enhancements for array operations.
Note that the traditional patching templates are used by default in order to maintain backwards compatibility. The generic
patching is activated if `arrayPrimaryKeys` is non-empty or `forceGenericPatch` is set to `true` .
#### Advanced JSON Patch for Arrays
2025-04-15 15:13:52 -05:00
Standard JSON Patch Limitations:
2025-04-10 18:17:27 -05:00
The JSON Patch standard allows for array modifications primarily through index-based operations:
2025-04-16 16:55:51 -07:00
- `add/[index]` : Insert at specific position
- `remove/[index]` : Remove element at position
- `replace/[index]` : Replace element at position
2025-04-10 18:17:27 -05:00
This approach becomes problematic when:
2025-04-16 16:55:51 -07:00
- Array ordering is unpredictable or may change
- Multiple clients are modifying the same resource concurrently
- The client doesn't have knowledge of current array indexes
2025-04-10 18:17:27 -05:00
2025-04-15 15:13:52 -05:00
Key Concept: Array Primary Keys:
2025-04-10 18:17:27 -05:00
2025-04-16 16:55:51 -07:00
DataHub extends the JSON Patch standard with the `arrayPrimaryKeys` field that transforms array operations into map-like
2025-04-10 18:17:27 -05:00
operations:
2025-04-16 16:55:51 -07:00
- Arrays are conceptually treated as maps where each element can be addressed by its primary key
- Primary keys can be composite (multiple fields combined)
- Path expressions use these keys instead of numeric indexes
- Backend handles the conversion between the map-like operations and actual array modifications
2025-04-10 18:17:27 -05:00
This approach allows for:
2025-04-16 16:55:51 -07:00
- Idempotent operations regardless of array order
- Targeted modifications without needing to know current array state
- Concurrent updates without conflicts (when modifying different array elements)
- More intuitive and maintainable API usage
2025-04-10 18:17:27 -05:00
#### Primary Key Definition and Path Construction
2025-04-15 15:13:52 -05:00
In this section lets take a look at a common patch use-case, adding/removing global tags while considering an
attribution source for the tag. The following examples are specifically modifying the `globalTags` aspect.
2025-04-10 18:17:27 -05:00
2025-04-15 15:13:52 -05:00
Defining Primary Keys:
2025-04-10 18:17:27 -05:00
The `arrayPrimaryKeys` property specifies which fields uniquely identify each array element:
```json
{
"arrayPrimaryKeys": {
2025-04-16 16:55:51 -07:00
"tags": ["attribution␟source", "tag"]
2025-04-10 18:17:27 -05:00
},
"patch": [
{
"op": "add",
"path": "/tags/urn:li:platformResource:source1/urn:li:tag:tag1",
"value": {
"tag": "urn:li:tag:tag1",
"attribution": {
"source": "urn:li:platformResource:source1",
"actor": "urn:li:corpuser:user",
"time": 0
}
}
}
]
}
```
In this example:
2025-04-16 16:55:51 -07:00
- tags is the array field being patched
- The primary key is a composite of attribution.source and tag
- The `␟` , "Unit Separator" (U+241F), delimiter indicates a nested path in the first key component
2025-04-10 18:17:27 -05:00
2025-04-15 15:13:52 -05:00
Path Construction:
2025-04-10 18:17:27 -05:00
When the backend processes a patch operation, it:
2025-04-16 16:55:51 -07:00
- Converts the array to a map using the specified primary key fields
- Processes the operation against this map representation
- Converts the map back to an array for storage
2025-04-10 18:17:27 -05:00
For example, with the path:
```text
/tags/urn:li:platformResource:source1/urn:li:tag:tag1
```
2025-04-15 15:13:52 -05:00
The system:
2025-04-10 18:17:27 -05:00
2025-04-16 16:55:51 -07:00
- Identifies tags as the target array
- Uses `urn:li:platformResource:source1` as the value for `attribution.source`
- Uses `urn:li:tag:tag1` as the value for the tag
- Finds the matching array element(s) with these key values
2025-04-10 18:17:27 -05:00
2025-04-15 15:13:52 -05:00
Supported Operations:
2025-04-10 18:17:27 -05:00
The implementation supports standard JSON Patch operations:
| Operation | Description |
2025-04-16 16:55:51 -07:00
| --------- | -------------------------------------- |
2025-04-10 18:17:27 -05:00
| add | Add a new element or replace if exists |
| remove | Remove an element matching keys |
#### Patch Operation Examples
2025-04-15 15:13:52 -05:00
Adding Tagged Elements with Attribution:
2025-04-10 18:17:27 -05:00
```json
{
"op": "add",
"path": "/tags/urn:li:platformResource:source1/urn:li:tag:tag1",
"value": {
"tag": "urn:li:tag:tag1",
"attribution": {
"source": "urn:li:platformResource:source1",
"actor": "urn:li:corpuser:user",
"time": 0
}
}
}
```
This operation:
2025-04-16 16:55:51 -07:00
- Checks if an element with matching keys exists in the array
- If not found, adds the new element
- If found, replaces the existing element
2025-04-10 18:17:27 -05:00
2025-04-15 15:13:52 -05:00
Selective Removal:
2025-04-10 18:17:27 -05:00
```json
{
"op": "remove",
"path": "/tags/urn:li:platformResource:source1/urn:li:tag:tag1"
}
```
This operation:
2025-04-16 16:55:51 -07:00
- Finds elements matching the composite key
- Removes only those elements
- Preserves other elements, even with partially matching keys