11 KiB

Patch Operations Guide

SDK V2 uses patch-based updates for efficient, surgical modifications to metadata. This guide explains how patches work and when to use them.

What Are Patches?

Patches are incremental updates that modify specific fields without replacing entire aspects. Instead of sending the full datasetProperties aspect, a patch sends only the changes.

Patch vs Full Update

Full Update (V1 Style):

// Fetch entire aspect
DatasetProperties props = getDatasetProperties(urn);

// Modify one field
props.setDescription("New description");

// Send entire aspect back (overwrites everything)
sendAspect(urn, props);

Patch Update (V2 Style):

// Send only the change
dataset.setDescription("New description");
client.entities().update(dataset);
// Sends JSON Patch: { "op": "add", "path": "/description", "value": "New description" }

Benefits of Patches

  1. Efficiency - Only changed fields sent over network
  2. Concurrency Safety - Less risk of overwriting concurrent changes
  3. Atomicity - Multiple patches applied together or not at all
  4. Bandwidth - Reduced payload size

How Patches Work in SDK V2

Patch Accumulation Pattern

Entities accumulate patches in a pending list until save:

Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_table")
    .build();

// Each method creates a patch MCP and adds to pendingPatches list
dataset.addTag("pii");              // Patch 1
dataset.addTag("sensitive");        // Patch 2
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER);  // Patch 3

// Check pending patches
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 3

// Emit all patches atomically
client.entities().update(dataset);

// Patches cleared after emission
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 0

Under the Hood

// From Dataset.java
public Dataset addTag(@Nonnull String tagUrn) {
    // Create patch using existing patch builder
    GlobalTagsPatchBuilder patch = new GlobalTagsPatchBuilder()
        .urn(getUrn())
        .addTag(tag, null);

    // Add to pending patches list
    addPatchMcp(patch.build());

    return this;
}

When update() is called:

// From EntityClient.java
public void upsert(Entity entity) {
    if (entity.hasPendingPatches()) {
        // Emit patches
        for (MetadataChangeProposal patchMcp : entity.getPendingPatches()) {
            emitter.emit(patchMcp, null);
        }
        entity.clearPendingPatches();
    } else {
        // No patches, emit full aspects
        for (MetadataChangeProposalWrapper mcp : entity.toMCPs()) {
            emitter.emit(mcp);
        }
    }
}

Reusing Existing Patch Builders

SDK V2 reuses existing patch builders from datahub.client.patch package:

Available Patch Builders

Builder Purpose Example
OwnershipPatchBuilder Add/remove owners addOwner(), removeOwner()
GlobalTagsPatchBuilder Add/remove tags addTag(), removeTag()
GlossaryTermsPatchBuilder Add/remove terms addTerm(), removeTerm()
DomainsPatchBuilder Set/remove domain setDomain(), removeDomain()
DatasetPropertiesPatchBuilder Update properties setDescription(), addCustomProperty()
EditableDatasetPropertiesPatchBuilder Update editable properties setEditableDescription()

Why Reuse?

  • Battle-tested - Used by Python SDK V2 in production
  • Correctness - Complex JSON Patch logic already validated
  • Consistency - Same semantics across language SDKs
  • Maintainability - Single implementation to maintain

When to Use Patches

Use Patches For:

Incremental changes to existing entities

Dataset dataset = client.entities().get(urn);
dataset.addTag("new-tag");
client.entities().update(dataset);  // Patch

Adding metadata to entities

dataset.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
dataset.addCustomProperty("updated_at", String.valueOf(System.currentTimeMillis()));
client.entities().update(dataset);  // Multiple patches

Surgical updates without full entity knowledge

// Don't need to fetch entire entity
dataset.addTag("gdpr");
client.entities().update(dataset);  // Just adds tag

Use Full Upsert For:

Creating new entities

Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_table")
    .description("New dataset")
    .build();

client.entities().upsert(dataset);  // Full upsert

Replacing entire aspects

// Set complete schema
SchemaMetadata schema = buildCompleteSchema();
dataset.setSchema(schema);
client.entities().upsert(dataset);  // Sends full schema aspect

Builder-provided metadata

Dataset dataset = Dataset.builder()
    .platform("postgres")
    .name("my_table")
    .description("Description from builder")
    .build();

// Builder populates aspectCache with full aspects
client.entities().upsert(dataset);  // Sends cached aspects

Patch Operations by Entity

Dataset Patches

Ownership:

dataset.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
dataset.removeOwner("urn:li:corpuser:jane");

Tags:

dataset.addTag("pii");
dataset.removeTag("deprecated");

Glossary Terms:

dataset.addTerm("urn:li:glossaryTerm:CustomerData");
dataset.removeTerm("urn:li:glossaryTerm:OldTerm");

Domain:

dataset.setDomain("urn:li:domain:Marketing");
dataset.removeDomain();

Properties:

dataset.addCustomProperty("team", "data-eng");
dataset.removeCustomProperty("old_property");
dataset.setDescription("New description");

Chart Patches

Chart supports the same patch operations as Dataset:

chart.addOwner("urn:li:corpuser:analyst", OwnershipType.TECHNICAL_OWNER);
chart.addTag("visualization");
chart.addTerm("urn:li:glossaryTerm:SalesMetrics");
chart.setDomain("urn:li:domain:BusinessIntelligence");

See Chart Entity Guide for complete details.

Advanced: Manual Patch Construction

For advanced use cases, construct patches directly:

import com.linkedin.metadata.aspect.patch.builder.OwnershipPatchBuilder;
import com.linkedin.common.urn.Urn;

// Manual patch construction
OwnershipPatchBuilder patchBuilder = new OwnershipPatchBuilder()
    .urn(dataset.getUrn())
    .addOwner(
        Urn.createFromString("urn:li:corpuser:alice"),
        OwnershipType.DATA_STEWARD
    );

MetadataChangeProposal patch = patchBuilder.build();

// Add to entity's pending patches
dataset.addPatchMcp(patch);

// Or emit directly
emitter.emit(patch, null);

Patch vs Upsert Decision Tree

New entity from builder?
├─ Yes → Use upsert() (sends cached aspects)
└─ No → Loaded from server or reference?
    ├─ Yes → Making incremental changes?
    │   ├─ Yes → Use update() (sends patches)
    │   └─ No → Replacing entire aspect?
    │       └─ Yes → Use upsert() (sends full aspect)
    └─ No → Just adding tags/owners/etc?
        └─ Yes → Use update() (sends patches)

Pending Patches Management

Check for Pending Patches

if (dataset.hasPendingPatches()) {
    System.out.println("Entity has pending patches");
}

Get Pending Patches

List<MetadataChangeProposal> patches = dataset.getPendingPatches();
for (MetadataChangeProposal patch : patches) {
    System.out.println("Patch for aspect: " + patch.getAspectName());
}

Clear Pending Patches

// Manually clear without emitting
dataset.clearPendingPatches();

Batch Multiple Changes

// Accumulate many patches
dataset.addTag("tag1")
       .addTag("tag2")
       .addTag("tag3")
       .addOwner("user1", OwnershipType.TECHNICAL_OWNER)
       .addOwner("user2", OwnershipType.DATA_STEWARD)
       .addCustomProperty("key1", "value1")
       .addCustomProperty("key2", "value2");

// All 7 patches emitted in single update() call
client.entities().update(dataset);

Performance Considerations

Network Efficiency

// Inefficient: 3 separate network calls
dataset.addTag("tag1");
client.entities().update(dataset);
dataset.addTag("tag2");
client.entities().update(dataset);
dataset.addTag("tag3");
client.entities().update(dataset);

// Efficient: 1 network call with 3 patches
dataset.addTag("tag1")
       .addTag("tag2")
       .addTag("tag3");
client.entities().update(dataset);

Payload Size

Full upsert (datasetProperties):

  • ~2-5 KB for typical dataset aspect

Patch (add tag):

  • ~200-300 bytes for single tag patch

10 tags: Patches = ~3 KB, Full upsert = ~5 KB

JSON Patch Format

Patches use JSON Patch (RFC 6902) format:

Add operation:

{
  "op": "add",
  "path": "/tags/urn:li:tag:pii",
  "value": {
    "tag": "urn:li:tag:pii"
  }
}

Remove operation:

{
  "op": "remove",
  "path": "/tags/urn:li:tag:deprecated"
}

SDK V2 abstracts this complexity - you work with Java methods, not JSON.

Troubleshooting

Patches Not Applied

Issue: Changes not visible in DataHub

Solutions:

  • Verify update() was called (patches don't emit automatically)
  • Check for errors in emission response
  • Ensure entity is bound to client

Concurrent Updates

Issue: Patches conflict with concurrent changes

Solutions:

  • Patches are generally safe for concurrent updates
  • Each patch is atomic
  • For complex scenarios, load entity first to get latest state

Patch Cleared Unexpectedly

Issue: Pending patches disappear

Reason: upsert() or update() clears patches after emission

Solution: This is expected behavior - patches are one-time use

Next Steps

API Reference

Key classes:

  • Entity.java - Patch accumulation
  • EntityClient.java - Patch emission
  • datahub.client.patch.* - Patch builders