11 KiB
Patch Operations Guide
SDK V2 uses patch-based updates for efficient, surgical modifications to metadata. This guide explains how patches work and when to use them.
What Are Patches?
Patches are incremental updates that modify specific fields without replacing entire aspects. Instead of sending the full datasetProperties aspect, a patch sends only the changes.
Patch vs Full Update
Full Update (V1 Style):
// Fetch entire aspect
DatasetProperties props = getDatasetProperties(urn);
// Modify one field
props.setDescription("New description");
// Send entire aspect back (overwrites everything)
sendAspect(urn, props);
Patch Update (V2 Style):
// Send only the change
dataset.setDescription("New description");
client.entities().update(dataset);
// Sends JSON Patch: { "op": "add", "path": "/description", "value": "New description" }
Benefits of Patches
- Efficiency - Only changed fields sent over network
- Concurrency Safety - Less risk of overwriting concurrent changes
- Atomicity - Multiple patches applied together or not at all
- Bandwidth - Reduced payload size
How Patches Work in SDK V2
Patch Accumulation Pattern
Entities accumulate patches in a pending list until save:
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();
// Each method creates a patch MCP and adds to pendingPatches list
dataset.addTag("pii"); // Patch 1
dataset.addTag("sensitive"); // Patch 2
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER); // Patch 3
// Check pending patches
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 3
// Emit all patches atomically
client.entities().update(dataset);
// Patches cleared after emission
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 0
Under the Hood
// From Dataset.java
public Dataset addTag(@Nonnull String tagUrn) {
// Create patch using existing patch builder
GlobalTagsPatchBuilder patch = new GlobalTagsPatchBuilder()
.urn(getUrn())
.addTag(tag, null);
// Add to pending patches list
addPatchMcp(patch.build());
return this;
}
When update() is called:
// From EntityClient.java
public void upsert(Entity entity) {
if (entity.hasPendingPatches()) {
// Emit patches
for (MetadataChangeProposal patchMcp : entity.getPendingPatches()) {
emitter.emit(patchMcp, null);
}
entity.clearPendingPatches();
} else {
// No patches, emit full aspects
for (MetadataChangeProposalWrapper mcp : entity.toMCPs()) {
emitter.emit(mcp);
}
}
}
Reusing Existing Patch Builders
SDK V2 reuses existing patch builders from datahub.client.patch package:
Available Patch Builders
| Builder | Purpose | Example |
|---|---|---|
OwnershipPatchBuilder |
Add/remove owners | addOwner(), removeOwner() |
GlobalTagsPatchBuilder |
Add/remove tags | addTag(), removeTag() |
GlossaryTermsPatchBuilder |
Add/remove terms | addTerm(), removeTerm() |
DomainsPatchBuilder |
Set/remove domain | setDomain(), removeDomain() |
DatasetPropertiesPatchBuilder |
Update properties | setDescription(), addCustomProperty() |
EditableDatasetPropertiesPatchBuilder |
Update editable properties | setEditableDescription() |
Why Reuse?
- Battle-tested - Used by Python SDK V2 in production
- Correctness - Complex JSON Patch logic already validated
- Consistency - Same semantics across language SDKs
- Maintainability - Single implementation to maintain
When to Use Patches
Use Patches For:
✅ Incremental changes to existing entities
Dataset dataset = client.entities().get(urn);
dataset.addTag("new-tag");
client.entities().update(dataset); // Patch
✅ Adding metadata to entities
dataset.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
dataset.addCustomProperty("updated_at", String.valueOf(System.currentTimeMillis()));
client.entities().update(dataset); // Multiple patches
✅ Surgical updates without full entity knowledge
// Don't need to fetch entire entity
dataset.addTag("gdpr");
client.entities().update(dataset); // Just adds tag
Use Full Upsert For:
✅ Creating new entities
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.description("New dataset")
.build();
client.entities().upsert(dataset); // Full upsert
✅ Replacing entire aspects
// Set complete schema
SchemaMetadata schema = buildCompleteSchema();
dataset.setSchema(schema);
client.entities().upsert(dataset); // Sends full schema aspect
✅ Builder-provided metadata
Dataset dataset = Dataset.builder()
.platform("postgres")
.name("my_table")
.description("Description from builder")
.build();
// Builder populates aspectCache with full aspects
client.entities().upsert(dataset); // Sends cached aspects
Patch Operations by Entity
Dataset Patches
Ownership:
dataset.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
dataset.removeOwner("urn:li:corpuser:jane");
Tags:
dataset.addTag("pii");
dataset.removeTag("deprecated");
Glossary Terms:
dataset.addTerm("urn:li:glossaryTerm:CustomerData");
dataset.removeTerm("urn:li:glossaryTerm:OldTerm");
Domain:
dataset.setDomain("urn:li:domain:Marketing");
dataset.removeDomain();
Properties:
dataset.addCustomProperty("team", "data-eng");
dataset.removeCustomProperty("old_property");
dataset.setDescription("New description");
Chart Patches
Chart supports the same patch operations as Dataset:
chart.addOwner("urn:li:corpuser:analyst", OwnershipType.TECHNICAL_OWNER);
chart.addTag("visualization");
chart.addTerm("urn:li:glossaryTerm:SalesMetrics");
chart.setDomain("urn:li:domain:BusinessIntelligence");
See Chart Entity Guide for complete details.
Advanced: Manual Patch Construction
For advanced use cases, construct patches directly:
import com.linkedin.metadata.aspect.patch.builder.OwnershipPatchBuilder;
import com.linkedin.common.urn.Urn;
// Manual patch construction
OwnershipPatchBuilder patchBuilder = new OwnershipPatchBuilder()
.urn(dataset.getUrn())
.addOwner(
Urn.createFromString("urn:li:corpuser:alice"),
OwnershipType.DATA_STEWARD
);
MetadataChangeProposal patch = patchBuilder.build();
// Add to entity's pending patches
dataset.addPatchMcp(patch);
// Or emit directly
emitter.emit(patch, null);
Patch vs Upsert Decision Tree
New entity from builder?
├─ Yes → Use upsert() (sends cached aspects)
└─ No → Loaded from server or reference?
├─ Yes → Making incremental changes?
│ ├─ Yes → Use update() (sends patches)
│ └─ No → Replacing entire aspect?
│ └─ Yes → Use upsert() (sends full aspect)
└─ No → Just adding tags/owners/etc?
└─ Yes → Use update() (sends patches)
Pending Patches Management
Check for Pending Patches
if (dataset.hasPendingPatches()) {
System.out.println("Entity has pending patches");
}
Get Pending Patches
List<MetadataChangeProposal> patches = dataset.getPendingPatches();
for (MetadataChangeProposal patch : patches) {
System.out.println("Patch for aspect: " + patch.getAspectName());
}
Clear Pending Patches
// Manually clear without emitting
dataset.clearPendingPatches();
Batch Multiple Changes
// Accumulate many patches
dataset.addTag("tag1")
.addTag("tag2")
.addTag("tag3")
.addOwner("user1", OwnershipType.TECHNICAL_OWNER)
.addOwner("user2", OwnershipType.DATA_STEWARD)
.addCustomProperty("key1", "value1")
.addCustomProperty("key2", "value2");
// All 7 patches emitted in single update() call
client.entities().update(dataset);
Performance Considerations
Network Efficiency
// Inefficient: 3 separate network calls
dataset.addTag("tag1");
client.entities().update(dataset);
dataset.addTag("tag2");
client.entities().update(dataset);
dataset.addTag("tag3");
client.entities().update(dataset);
// Efficient: 1 network call with 3 patches
dataset.addTag("tag1")
.addTag("tag2")
.addTag("tag3");
client.entities().update(dataset);
Payload Size
Full upsert (datasetProperties):
- ~2-5 KB for typical dataset aspect
Patch (add tag):
- ~200-300 bytes for single tag patch
10 tags: Patches = ~3 KB, Full upsert = ~5 KB
JSON Patch Format
Patches use JSON Patch (RFC 6902) format:
Add operation:
{
"op": "add",
"path": "/tags/urn:li:tag:pii",
"value": {
"tag": "urn:li:tag:pii"
}
}
Remove operation:
{
"op": "remove",
"path": "/tags/urn:li:tag:deprecated"
}
SDK V2 abstracts this complexity - you work with Java methods, not JSON.
Troubleshooting
Patches Not Applied
Issue: Changes not visible in DataHub
Solutions:
- Verify
update()was called (patches don't emit automatically) - Check for errors in emission response
- Ensure entity is bound to client
Concurrent Updates
Issue: Patches conflict with concurrent changes
Solutions:
- Patches are generally safe for concurrent updates
- Each patch is atomic
- For complex scenarios, load entity first to get latest state
Patch Cleared Unexpectedly
Issue: Pending patches disappear
Reason: upsert() or update() clears patches after emission
Solution: This is expected behavior - patches are one-time use
Next Steps
- Design Principles - Architecture behind patches
- Dataset Entity Guide - All patch operations for datasets
- Migration Guide - Moving from full updates to patches
API Reference
Key classes:
- Entity.java - Patch accumulation
- EntityClient.java - Patch emission
- datahub.client.patch.* - Patch builders