mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-19 14:08:38 +00:00
428 lines
11 KiB
Markdown
428 lines
11 KiB
Markdown
# Patch Operations Guide
|
|
|
|
SDK V2 uses **patch-based updates** for efficient, surgical modifications to metadata. This guide explains how patches work and when to use them.
|
|
|
|
## What Are Patches?
|
|
|
|
Patches are **incremental updates** that modify specific fields without replacing entire aspects. Instead of sending the full `datasetProperties` aspect, a patch sends only the changes.
|
|
|
|
### Patch vs Full Update
|
|
|
|
**Full Update (V1 Style):**
|
|
|
|
```java
|
|
// Fetch entire aspect
|
|
DatasetProperties props = getDatasetProperties(urn);
|
|
|
|
// Modify one field
|
|
props.setDescription("New description");
|
|
|
|
// Send entire aspect back (overwrites everything)
|
|
sendAspect(urn, props);
|
|
```
|
|
|
|
**Patch Update (V2 Style):**
|
|
|
|
```java
|
|
// Send only the change
|
|
dataset.setDescription("New description");
|
|
client.entities().update(dataset);
|
|
// Sends JSON Patch: { "op": "add", "path": "/description", "value": "New description" }
|
|
```
|
|
|
|
### Benefits of Patches
|
|
|
|
1. **Efficiency** - Only changed fields sent over network
|
|
2. **Concurrency Safety** - Less risk of overwriting concurrent changes
|
|
3. **Atomicity** - Multiple patches applied together or not at all
|
|
4. **Bandwidth** - Reduced payload size
|
|
|
|
## How Patches Work in SDK V2
|
|
|
|
### Patch Accumulation Pattern
|
|
|
|
Entities accumulate patches in a pending list until save:
|
|
|
|
```java
|
|
Dataset dataset = Dataset.builder()
|
|
.platform("snowflake")
|
|
.name("my_table")
|
|
.build();
|
|
|
|
// Each method creates a patch MCP and adds to pendingPatches list
|
|
dataset.addTag("pii"); // Patch 1
|
|
dataset.addTag("sensitive"); // Patch 2
|
|
dataset.addOwner("user", OwnershipType.TECHNICAL_OWNER); // Patch 3
|
|
|
|
// Check pending patches
|
|
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
|
|
// Output: Pending patches: 3
|
|
|
|
// Emit all patches atomically
|
|
client.entities().update(dataset);
|
|
|
|
// Patches cleared after emission
|
|
System.out.println("Pending patches: " + dataset.getPendingPatches().size());
|
|
// Output: Pending patches: 0
|
|
```
|
|
|
|
### Under the Hood
|
|
|
|
```java
|
|
// From Dataset.java
|
|
public Dataset addTag(@Nonnull String tagUrn) {
|
|
// Create patch using existing patch builder
|
|
GlobalTagsPatchBuilder patch = new GlobalTagsPatchBuilder()
|
|
.urn(getUrn())
|
|
.addTag(tag, null);
|
|
|
|
// Add to pending patches list
|
|
addPatchMcp(patch.build());
|
|
|
|
return this;
|
|
}
|
|
```
|
|
|
|
When `update()` is called:
|
|
|
|
```java
|
|
// From EntityClient.java
|
|
public void upsert(Entity entity) {
|
|
if (entity.hasPendingPatches()) {
|
|
// Emit patches
|
|
for (MetadataChangeProposal patchMcp : entity.getPendingPatches()) {
|
|
emitter.emit(patchMcp, null);
|
|
}
|
|
entity.clearPendingPatches();
|
|
} else {
|
|
// No patches, emit full aspects
|
|
for (MetadataChangeProposalWrapper mcp : entity.toMCPs()) {
|
|
emitter.emit(mcp);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Reusing Existing Patch Builders
|
|
|
|
SDK V2 **reuses existing patch builders** from `datahub.client.patch` package:
|
|
|
|
### Available Patch Builders
|
|
|
|
| Builder | Purpose | Example |
|
|
| --------------------------------------- | -------------------------- | ----------------------------------------- |
|
|
| `OwnershipPatchBuilder` | Add/remove owners | `addOwner()`, `removeOwner()` |
|
|
| `GlobalTagsPatchBuilder` | Add/remove tags | `addTag()`, `removeTag()` |
|
|
| `GlossaryTermsPatchBuilder` | Add/remove terms | `addTerm()`, `removeTerm()` |
|
|
| `DomainsPatchBuilder` | Set/remove domain | `setDomain()`, `removeDomain()` |
|
|
| `DatasetPropertiesPatchBuilder` | Update properties | `setDescription()`, `addCustomProperty()` |
|
|
| `EditableDatasetPropertiesPatchBuilder` | Update editable properties | `setEditableDescription()` |
|
|
|
|
### Why Reuse?
|
|
|
|
- **Battle-tested** - Used by Python SDK V2 in production
|
|
- **Correctness** - Complex JSON Patch logic already validated
|
|
- **Consistency** - Same semantics across language SDKs
|
|
- **Maintainability** - Single implementation to maintain
|
|
|
|
## When to Use Patches
|
|
|
|
### Use Patches For:
|
|
|
|
✅ **Incremental changes to existing entities**
|
|
|
|
```java
|
|
Dataset dataset = client.entities().get(urn);
|
|
dataset.addTag("new-tag");
|
|
client.entities().update(dataset); // Patch
|
|
```
|
|
|
|
✅ **Adding metadata to entities**
|
|
|
|
```java
|
|
dataset.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
|
|
dataset.addCustomProperty("updated_at", String.valueOf(System.currentTimeMillis()));
|
|
client.entities().update(dataset); // Multiple patches
|
|
```
|
|
|
|
✅ **Surgical updates without full entity knowledge**
|
|
|
|
```java
|
|
// Don't need to fetch entire entity
|
|
dataset.addTag("gdpr");
|
|
client.entities().update(dataset); // Just adds tag
|
|
```
|
|
|
|
### Use Full Upsert For:
|
|
|
|
✅ **Creating new entities**
|
|
|
|
```java
|
|
Dataset dataset = Dataset.builder()
|
|
.platform("snowflake")
|
|
.name("my_table")
|
|
.description("New dataset")
|
|
.build();
|
|
|
|
client.entities().upsert(dataset); // Full upsert
|
|
```
|
|
|
|
✅ **Replacing entire aspects**
|
|
|
|
```java
|
|
// Set complete schema
|
|
SchemaMetadata schema = buildCompleteSchema();
|
|
dataset.setSchema(schema);
|
|
client.entities().upsert(dataset); // Sends full schema aspect
|
|
```
|
|
|
|
✅ **Builder-provided metadata**
|
|
|
|
```java
|
|
Dataset dataset = Dataset.builder()
|
|
.platform("postgres")
|
|
.name("my_table")
|
|
.description("Description from builder")
|
|
.build();
|
|
|
|
// Builder populates aspectCache with full aspects
|
|
client.entities().upsert(dataset); // Sends cached aspects
|
|
```
|
|
|
|
## Patch Operations by Entity
|
|
|
|
### Dataset Patches
|
|
|
|
**Ownership:**
|
|
|
|
```java
|
|
dataset.addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);
|
|
dataset.removeOwner("urn:li:corpuser:jane");
|
|
```
|
|
|
|
**Tags:**
|
|
|
|
```java
|
|
dataset.addTag("pii");
|
|
dataset.removeTag("deprecated");
|
|
```
|
|
|
|
**Glossary Terms:**
|
|
|
|
```java
|
|
dataset.addTerm("urn:li:glossaryTerm:CustomerData");
|
|
dataset.removeTerm("urn:li:glossaryTerm:OldTerm");
|
|
```
|
|
|
|
**Domain:**
|
|
|
|
```java
|
|
dataset.setDomain("urn:li:domain:Marketing");
|
|
dataset.removeDomain();
|
|
```
|
|
|
|
**Properties:**
|
|
|
|
```java
|
|
dataset.addCustomProperty("team", "data-eng");
|
|
dataset.removeCustomProperty("old_property");
|
|
dataset.setDescription("New description");
|
|
```
|
|
|
|
### Chart Patches
|
|
|
|
Chart supports the same patch operations as Dataset:
|
|
|
|
```java
|
|
chart.addOwner("urn:li:corpuser:analyst", OwnershipType.TECHNICAL_OWNER);
|
|
chart.addTag("visualization");
|
|
chart.addTerm("urn:li:glossaryTerm:SalesMetrics");
|
|
chart.setDomain("urn:li:domain:BusinessIntelligence");
|
|
```
|
|
|
|
See [Chart Entity Guide](./chart-entity.md) for complete details.
|
|
|
|
## Advanced: Manual Patch Construction
|
|
|
|
For advanced use cases, construct patches directly:
|
|
|
|
```java
|
|
import com.linkedin.metadata.aspect.patch.builder.OwnershipPatchBuilder;
|
|
import com.linkedin.common.urn.Urn;
|
|
|
|
// Manual patch construction
|
|
OwnershipPatchBuilder patchBuilder = new OwnershipPatchBuilder()
|
|
.urn(dataset.getUrn())
|
|
.addOwner(
|
|
Urn.createFromString("urn:li:corpuser:alice"),
|
|
OwnershipType.DATA_STEWARD
|
|
);
|
|
|
|
MetadataChangeProposal patch = patchBuilder.build();
|
|
|
|
// Add to entity's pending patches
|
|
dataset.addPatchMcp(patch);
|
|
|
|
// Or emit directly
|
|
emitter.emit(patch, null);
|
|
```
|
|
|
|
## Patch vs Upsert Decision Tree
|
|
|
|
```
|
|
New entity from builder?
|
|
├─ Yes → Use upsert() (sends cached aspects)
|
|
└─ No → Loaded from server or reference?
|
|
├─ Yes → Making incremental changes?
|
|
│ ├─ Yes → Use update() (sends patches)
|
|
│ └─ No → Replacing entire aspect?
|
|
│ └─ Yes → Use upsert() (sends full aspect)
|
|
└─ No → Just adding tags/owners/etc?
|
|
└─ Yes → Use update() (sends patches)
|
|
```
|
|
|
|
## Pending Patches Management
|
|
|
|
### Check for Pending Patches
|
|
|
|
```java
|
|
if (dataset.hasPendingPatches()) {
|
|
System.out.println("Entity has pending patches");
|
|
}
|
|
```
|
|
|
|
### Get Pending Patches
|
|
|
|
```java
|
|
List<MetadataChangeProposal> patches = dataset.getPendingPatches();
|
|
for (MetadataChangeProposal patch : patches) {
|
|
System.out.println("Patch for aspect: " + patch.getAspectName());
|
|
}
|
|
```
|
|
|
|
### Clear Pending Patches
|
|
|
|
```java
|
|
// Manually clear without emitting
|
|
dataset.clearPendingPatches();
|
|
```
|
|
|
|
### Batch Multiple Changes
|
|
|
|
```java
|
|
// Accumulate many patches
|
|
dataset.addTag("tag1")
|
|
.addTag("tag2")
|
|
.addTag("tag3")
|
|
.addOwner("user1", OwnershipType.TECHNICAL_OWNER)
|
|
.addOwner("user2", OwnershipType.DATA_STEWARD)
|
|
.addCustomProperty("key1", "value1")
|
|
.addCustomProperty("key2", "value2");
|
|
|
|
// All 7 patches emitted in single update() call
|
|
client.entities().update(dataset);
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Network Efficiency
|
|
|
|
```java
|
|
// Inefficient: 3 separate network calls
|
|
dataset.addTag("tag1");
|
|
client.entities().update(dataset);
|
|
dataset.addTag("tag2");
|
|
client.entities().update(dataset);
|
|
dataset.addTag("tag3");
|
|
client.entities().update(dataset);
|
|
|
|
// Efficient: 1 network call with 3 patches
|
|
dataset.addTag("tag1")
|
|
.addTag("tag2")
|
|
.addTag("tag3");
|
|
client.entities().update(dataset);
|
|
```
|
|
|
|
### Payload Size
|
|
|
|
**Full upsert (datasetProperties):**
|
|
|
|
- ~2-5 KB for typical dataset aspect
|
|
|
|
**Patch (add tag):**
|
|
|
|
- ~200-300 bytes for single tag patch
|
|
|
|
**10 tags:** Patches = ~3 KB, Full upsert = ~5 KB
|
|
|
|
## JSON Patch Format
|
|
|
|
Patches use [JSON Patch (RFC 6902)](https://datatracker.ietf.org/doc/html/rfc6902) format:
|
|
|
|
**Add operation:**
|
|
|
|
```json
|
|
{
|
|
"op": "add",
|
|
"path": "/tags/urn:li:tag:pii",
|
|
"value": {
|
|
"tag": "urn:li:tag:pii"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Remove operation:**
|
|
|
|
```json
|
|
{
|
|
"op": "remove",
|
|
"path": "/tags/urn:li:tag:deprecated"
|
|
}
|
|
```
|
|
|
|
SDK V2 abstracts this complexity - you work with Java methods, not JSON.
|
|
|
|
## Troubleshooting
|
|
|
|
### Patches Not Applied
|
|
|
|
**Issue:** Changes not visible in DataHub
|
|
|
|
**Solutions:**
|
|
|
|
- Verify `update()` was called (patches don't emit automatically)
|
|
- Check for errors in emission response
|
|
- Ensure entity is bound to client
|
|
|
|
### Concurrent Updates
|
|
|
|
**Issue:** Patches conflict with concurrent changes
|
|
|
|
**Solutions:**
|
|
|
|
- Patches are generally safe for concurrent updates
|
|
- Each patch is atomic
|
|
- For complex scenarios, load entity first to get latest state
|
|
|
|
### Patch Cleared Unexpectedly
|
|
|
|
**Issue:** Pending patches disappear
|
|
|
|
**Reason:** `upsert()` or `update()` clears patches after emission
|
|
|
|
**Solution:** This is expected behavior - patches are one-time use
|
|
|
|
## Next Steps
|
|
|
|
- **[Design Principles](./design-principles.md)** - Architecture behind patches
|
|
- **[Dataset Entity Guide](./dataset-entity.md)** - All patch operations for datasets
|
|
- **[Migration Guide](./migration-from-v1.md)** - Moving from full updates to patches
|
|
|
|
## API Reference
|
|
|
|
Key classes:
|
|
|
|
- Entity.java - Patch accumulation
|
|
- EntityClient.java - Patch emission
|
|
- datahub.client.patch.\* - Patch builders
|