428 lines
11 KiB
Markdown

# Patch Operations Guide
SDK V2 uses **patch-based updates** for efficient, surgical modifications to metadata. This guide explains how patches work and when to use them.
## What Are Patches?
Patches are **incremental updates** that modify specific fields without replacing entire aspects. Instead of sending the full `datasetProperties` aspect, a patch sends only the changes.
### Patch vs Full Update
**Full Update (V1 Style):**
```java
// Fetch entire aspect
DatasetProperties props = getDatasetProperties(urn);
// Modify one field
props.setDescription("New description");
// Send entire aspect back (overwrites everything)
sendAspect(urn, props);
```
**Patch Update (V2 Style):**
```java
// Send only the change
dataset.setDescription("New description");
client.entities().update(dataset);
// Sends JSON Patch: { "op": "add", "path": "/description", "value": "New description" }
```
### Benefits of Patches
1. **Efficiency** - Only changed fields sent over network
2. **Concurrency Safety** - Less risk of overwriting concurrent changes
3. **Atomicity** - Multiple patches applied together or not at all
4. **Bandwidth** - Reduced payload size
## How Patches Work in SDK V2
### Patch Accumulation Pattern
Entities accumulate patches in a pending list until save:
```java
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();
// Each method creates a patch MCP and adds to pendingPatches list
dataset.addTag("pii"); // Patch 1
dataset.addTag("sensitive"); // Patch 2
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER); // Patch 3
// Check pending patches
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 3
// Emit all patches atomically
client.entities().update(dataset);
// Patches cleared after emission
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
// Output: Pending patches: 0
```
### Under the Hood
```java
// From Dataset.java
public Dataset addTag(@Nonnull String tagUrn) {
// Create patch using existing patch builder
GlobalTagsPatchBuilder patch = new GlobalTagsPatchBuilder()
.urn(getUrn())
.addTag(tag, null);
// Add to pending patches list
addPatchMcp(patch.build());
return this;
}
```
When `update()` is called:
```java
// From EntityClient.java
public void upsert(Entity entity) {
if (entity.hasPendingPatches()) {
// Emit patches
for (MetadataChangeProposal patchMcp : entity.getPendingPatches()) {
emitter.emit(patchMcp, null);
}
entity.clearPendingPatches();
} else {
// No patches, emit full aspects
for (MetadataChangeProposalWrapper mcp : entity.toMCPs()) {
emitter.emit(mcp);
}
}
}
```
## Reusing Existing Patch Builders
SDK V2 **reuses existing patch builders** from `datahub.client.patch` package:
### Available Patch Builders
| Builder | Purpose | Example |
| --------------------------------------- | -------------------------- | ----------------------------------------- |
| `OwnershipPatchBuilder` | Add/remove owners | `addOwner()`, `removeOwner()` |
| `GlobalTagsPatchBuilder` | Add/remove tags | `addTag()`, `removeTag()` |
| `GlossaryTermsPatchBuilder` | Add/remove terms | `addTerm()`, `removeTerm()` |
| `DomainsPatchBuilder` | Set/remove domain | `setDomain()`, `removeDomain()` |
| `DatasetPropertiesPatchBuilder` | Update properties | `setDescription()`, `addCustomProperty()` |
| `EditableDatasetPropertiesPatchBuilder` | Update editable properties | `setEditableDescription()` |
### Why Reuse?
- **Battle-tested** - Used by Python SDK V2 in production
- **Correctness** - Complex JSON Patch logic already validated
- **Consistency** - Same semantics across language SDKs
- **Maintainability** - Single implementation to maintain
## When to Use Patches
### Use Patches For:
**Incremental changes to existing entities**
```java
Dataset dataset = client.entities().get(urn);
dataset.addTag("new-tag");
client.entities().update(dataset); // Patch
```
**Adding metadata to entities**
```java
dataset.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
dataset.addCustomProperty("updated_at", String.valueOf(System.currentTimeMillis()));
client.entities().update(dataset); // Multiple patches
```
**Surgical updates without full entity knowledge**
```java
// Don't need to fetch entire entity
dataset.addTag("gdpr");
client.entities().update(dataset); // Just adds tag
```
### Use Full Upsert For:
**Creating new entities**
```java
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.description("New dataset")
.build();
client.entities().upsert(dataset); // Full upsert
```
**Replacing entire aspects**
```java
// Set complete schema
SchemaMetadata schema = buildCompleteSchema();
dataset.setSchema(schema);
client.entities().upsert(dataset); // Sends full schema aspect
```
**Builder-provided metadata**
```java
Dataset dataset = Dataset.builder()
.platform("postgres")
.name("my_table")
.description("Description from builder")
.build();
// Builder populates aspectCache with full aspects
client.entities().upsert(dataset); // Sends cached aspects
```
## Patch Operations by Entity
### Dataset Patches
**Ownership:**
```java
dataset.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
dataset.removeOwner("urn:li:corpuser:jane");
```
**Tags:**
```java
dataset.addTag("pii");
dataset.removeTag("deprecated");
```
**Glossary Terms:**
```java
dataset.addTerm("urn:li:glossaryTerm:CustomerData");
dataset.removeTerm("urn:li:glossaryTerm:OldTerm");
```
**Domain:**
```java
dataset.setDomain("urn:li:domain:Marketing");
dataset.removeDomain();
```
**Properties:**
```java
dataset.addCustomProperty("team", "data-eng");
dataset.removeCustomProperty("old_property");
dataset.setDescription("New description");
```
### Chart Patches
Chart supports the same patch operations as Dataset:
```java
chart.addOwner("urn:li:corpuser:analyst", OwnershipType.TECHNICAL_OWNER);
chart.addTag("visualization");
chart.addTerm("urn:li:glossaryTerm:SalesMetrics");
chart.setDomain("urn:li:domain:BusinessIntelligence");
```
See [Chart Entity Guide](./chart-entity.md) for complete details.
## Advanced: Manual Patch Construction
For advanced use cases, construct patches directly:
```java
import com.linkedin.metadata.aspect.patch.builder.OwnershipPatchBuilder;
import com.linkedin.common.urn.Urn;
// Manual patch construction
OwnershipPatchBuilder patchBuilder = new OwnershipPatchBuilder()
.urn(dataset.getUrn())
.addOwner(
Urn.createFromString("urn:li:corpuser:alice"),
OwnershipType.DATA_STEWARD
);
MetadataChangeProposal patch = patchBuilder.build();
// Add to entity's pending patches
dataset.addPatchMcp(patch);
// Or emit directly
emitter.emit(patch, null);
```
## Patch vs Upsert Decision Tree
```
New entity from builder?
├─ Yes → Use upsert() (sends cached aspects)
└─ No → Loaded from server or reference?
├─ Yes → Making incremental changes?
│ ├─ Yes → Use update() (sends patches)
│ └─ No → Replacing entire aspect?
│ └─ Yes → Use upsert() (sends full aspect)
└─ No → Just adding tags/owners/etc?
└─ Yes → Use update() (sends patches)
```
## Pending Patches Management
### Check for Pending Patches
```java
if (dataset.hasPendingPatches()) {
System.out.println("Entity has pending patches");
}
```
### Get Pending Patches
```java
List<MetadataChangeProposal> patches = dataset.getPendingPatches();
for (MetadataChangeProposal patch : patches) {
System.out.println("Patch for aspect: " + patch.getAspectName());
}
```
### Clear Pending Patches
```java
// Manually clear without emitting
dataset.clearPendingPatches();
```
### Batch Multiple Changes
```java
// Accumulate many patches
dataset.addTag("tag1")
.addTag("tag2")
.addTag("tag3")
.addOwner("user1", OwnershipType.TECHNICAL_OWNER)
.addOwner("user2", OwnershipType.DATA_STEWARD)
.addCustomProperty("key1", "value1")
.addCustomProperty("key2", "value2");
// All 7 patches emitted in single update() call
client.entities().update(dataset);
```
## Performance Considerations
### Network Efficiency
```java
// Inefficient: 3 separate network calls
dataset.addTag("tag1");
client.entities().update(dataset);
dataset.addTag("tag2");
client.entities().update(dataset);
dataset.addTag("tag3");
client.entities().update(dataset);
// Efficient: 1 network call with 3 patches
dataset.addTag("tag1")
.addTag("tag2")
.addTag("tag3");
client.entities().update(dataset);
```
### Payload Size
**Full upsert (datasetProperties):**
- ~2-5 KB for typical dataset aspect
**Patch (add tag):**
- ~200-300 bytes for single tag patch
**10 tags:** Patches = ~3 KB, Full upsert = ~5 KB
## JSON Patch Format
Patches use [JSON Patch (RFC 6902)](https://datatracker.ietf.org/doc/html/rfc6902) format:
**Add operation:**
```json
{
"op": "add",
"path": "/tags/urn:li:tag:pii",
"value": {
"tag": "urn:li:tag:pii"
}
}
```
**Remove operation:**
```json
{
"op": "remove",
"path": "/tags/urn:li:tag:deprecated"
}
```
SDK V2 abstracts this complexity - you work with Java methods, not JSON.
## Troubleshooting
### Patches Not Applied
**Issue:** Changes not visible in DataHub
**Solutions:**
- Verify `update()` was called (patches don't emit automatically)
- Check for errors in emission response
- Ensure entity is bound to client
### Concurrent Updates
**Issue:** Patches conflict with concurrent changes
**Solutions:**
- Patches are generally safe for concurrent updates
- Each patch is atomic
- For complex scenarios, load entity first to get latest state
### Patch Cleared Unexpectedly
**Issue:** Pending patches disappear
**Reason:** `upsert()` or `update()` clears patches after emission
**Solution:** This is expected behavior - patches are one-time use
## Next Steps
- **[Design Principles](./design-principles.md)** - Architecture behind patches
- **[Dataset Entity Guide](./dataset-entity.md)** - All patch operations for datasets
- **[Migration Guide](./migration-from-v1.md)** - Moving from full updates to patches
## API Reference
Key classes:
- Entity.java - Patch accumulation
- EntityClient.java - Patch emission
- datahub.client.patch.\* - Patch builders