12 KiB

Migration Guide: V1 to V2

This guide helps you migrate from Java SDK V1 (RestEmitter) to V2 (DataHubClientV2). We'll show side-by-side examples and highlight key differences.

Why Migrate?

V2 offers significant improvements over V1:

  • Type-safe entity builders instead of manual MCP construction
  • Automatic URN generation instead of string manipulation
  • Patch-based updates for efficient incremental changes
  • Fluent API with method chaining
  • Lazy loading with caching
  • Mode-aware operations (SDK vs INGESTION)

Key Differences

Aspect V1 (RestEmitter) V2 (DataHubClientV2)
Abstraction Low-level MCPs High-level entities
URN Construction Manual strings Automatic from builder
Updates Full aspect replacement Patch-based incremental
Type Safety Minimal Strong compile-time checking
API Style Imperative emission Fluent builders
Entity Support Generic MCPs Dataset, Chart, Dashboard

Migration Examples

Example 1: Creating a Dataset

V1 (RestEmitter):

import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import com.linkedin.dataset.DatasetProperties;
import com.linkedin.common.urn.DatasetUrn;

// Manual URN construction
DatasetUrn urn = new DatasetUrn(
    new DataPlatformUrn("snowflake"),
    "my_database.my_schema.my_table",
    FabricType.PROD
);

// Manual aspect construction
DatasetProperties props = new DatasetProperties();
props.setDescription("My dataset description");
props.setName("My Dataset");

// Manual MCP construction
MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()
    .entityType("dataset")
    .entityUrn(urn)
    .upsert()
    .aspect(props)
    .build();

// Create emitter
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080"));

// Emit
emitter.emit(mcp, null).get();

V2 (DataHubClientV2):

import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;

// Fluent builder
Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_database.my_schema.my_table")
    .env("PROD")
    .description("My dataset description")
    .displayName("My Dataset")
    .build();

// Create client
DataHubClientV2 client = DataHubClientV2.builder()
    .server("http://localhost:8080")
    .build();

// Upsert (URN auto-generated, aspect auto-wired)
client.entities().upsert(dataset);

Changes:

  • No manual URN construction
  • No manual aspect creation
  • No MCP wrapper construction
  • Fluent builder handles everything
  • Type-safe method calls
  • Automatic aspect wiring

Example 2: Adding Tags

V1 (RestEmitter):

import com.linkedin.common.GlobalTags;
import com.linkedin.common.TagAssociation;
import com.linkedin.common.TagAssociationArray;
import com.linkedin.common.urn.TagUrn;

// Fetch existing tags or create new
GlobalTags tags = fetchExistingTags(urn);  // You implement this
if (tags == null) {
    tags = new GlobalTags();
    tags.setTags(new TagAssociationArray());
}

// Add new tag
TagAssociation newTag = new TagAssociation();
newTag.setTag(new TagUrn("pii"));
tags.getTags().add(newTag);

// Create MCP to replace entire GlobalTags aspect
MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()
    .entityType("dataset")
    .entityUrn(urn)
    .upsert()
    .aspect(tags)
    .build();

emitter.emit(mcp, null).get();

V2 (DataHubClientV2):

// Just add the tag - patch handles everything
dataset.addTag("pii");
client.entities().update(dataset);

Changes:

  • No fetching existing tags
  • No manual aspect manipulation
  • No MCP construction
  • Single method call
  • Patch-based (doesn't overwrite other tags)
  • Automatic URN handling

Example 3: Adding Owners

V1 (RestEmitter):

import com.linkedin.common.Ownership;
import com.linkedin.common.Owner;
import com.linkedin.common.OwnerArray;
import com.linkedin.common.OwnershipType;
import com.linkedin.common.urn.Urn;

// Fetch existing owners or create new
Ownership ownership = fetchExistingOwnership(urn);
if (ownership == null) {
    ownership = new Ownership();
    ownership.setOwners(new OwnerArray());
}

// Add new owner
Owner newOwner = new Owner();
newOwner.setOwner(Urn.createFromString("urn:li:corpuser:john_doe"));
newOwner.setType(OwnershipType.TECHNICAL_OWNER);
ownership.getOwners().add(newOwner);

// Create MCP
MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()
    .entityType("dataset")
    .entityUrn(urn)
    .upsert()
    .aspect(ownership)
    .build();

emitter.emit(mcp, null).get();

V2 (DataHubClientV2):

dataset.addOwner("urn:li:corpuser:john_doe", OwnershipType.TECHNICAL_OWNER);
client.entities().update(dataset);

Changes:

  • No fetching existing owners
  • No manual Owner object creation
  • No array manipulation
  • Single method with parameters
  • Type-safe ownership type enum
  • Automatic patch creation

Example 4: Multiple Metadata Additions

V1 (RestEmitter):

// Create dataset properties
DatasetProperties props = new DatasetProperties();
props.setDescription("My description");

// Create tags
GlobalTags tags = new GlobalTags();
TagAssociationArray tagArray = new TagAssociationArray();
tagArray.add(createTagAssociation("pii"));
tagArray.add(createTagAssociation("sensitive"));
tags.setTags(tagArray);

// Create ownership
Ownership ownership = new Ownership();
OwnerArray ownerArray = new OwnerArray();
ownerArray.add(createOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER));
ownership.setOwners(ownerArray);

// Create 3 separate MCPs and emit each
emitter.emit(createMCP(urn, props), null).get();
emitter.emit(createMCP(urn, tags), null).get();
emitter.emit(createMCP(urn, ownership), null).get();

V2 (DataHubClientV2):

Dataset dataset = Dataset.builder()
    .platform("snowflake")
    .name("my_table")
    .description("My description")
    .build();

dataset.addTag("pii")
       .addTag("sensitive")
       .addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER);

client.entities().upsert(dataset);  // Single call, all metadata included

Changes:

  • No creating multiple aspects separately
  • No multiple emission calls
  • Method chaining for fluent API
  • Single upsert emits everything
  • Atomic operation

Example 5: Updating Existing Entity

V1 (RestEmitter):

// 1. Fetch current state from DataHub
DatasetProperties existingProps = fetchAspect(urn, DatasetProperties.class);

// 2. Modify
existingProps.setDescription("Updated description");

// 3. Send back (overwrites entire aspect)
MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()
    .entityType("dataset")
    .entityUrn(urn)
    .upsert()
    .aspect(existingProps)
    .build();

emitter.emit(mcp, null).get();

V2 (DataHubClientV2):

// Just modify - patch handles incremental update
Dataset dataset = client.entities().get(urn);  // Optional: load existing
dataset.setDescription("Updated description");
client.entities().update(dataset);  // Patch only changes description

Changes:

  • Can update without fetching (for patches)
  • Patch-based incremental update
  • No risk of overwriting other fields
  • More efficient payload

Migration Checklist

1. Update Dependencies

Keep existing dependency (backwards compatible):

dependencies {
    implementation 'io.acryl:datahub-client:__version__'
}

2. Change Imports

Replace:

import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;

With:

import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import datahub.client.v2.entity.Chart;

3. Replace RestEmitter with DataHubClientV2

Before:

RestEmitter emitter = RestEmitter.create(b -> b
    .server("http://localhost:8080")
    .token("my-token")
);

After:

DataHubClientV2 client = DataHubClientV2.builder()
    .server("http://localhost:8080")
    .token("my-token")
    .build();

4. Use Entity Builders

Replace manual MCP/URN construction with entity builders:

Before:

DatasetUrn urn = new DatasetUrn(...);
DatasetProperties props = new DatasetProperties();
props.setDescription("...");
MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()...
emitter.emit(mcp, null).get();

After:

Dataset dataset = Dataset.builder()
    .platform("...")
    .name("...")
    .description("...")
    .build();
client.entities().upsert(dataset);

5. Use Patch Operations for Updates

Replace fetch-modify-send with patches:

Before:

GlobalTags tags = fetch(...);
tags.getTags().add(...);
emit(tags);

After:

dataset.addTag("...");
client.entities().update(dataset);

Gradual Migration Strategy

You can migrate incrementally - V1 and V2 can coexist:

// V1 emitter (for unsupported operations)
RestEmitter emitter = RestEmitter.create(b -> b.server("..."));

// V2 client (for entities)
DataHubClientV2 client = DataHubClientV2.builder()
    .server("...")
    .build();

// Use V2 for supported entities
Dataset dataset = Dataset.builder()...
client.entities().upsert(dataset);

// Fall back to V1 for custom MCPs
MetadataChangeProposalWrapper customMcp = ...;
emitter.emit(customMcp, null).get();

Common Pitfalls

1. Forgetting to Call update() or upsert()

Problem:

dataset.addTag("pii");  // Patch created but not emitted!
// Missing: client.entities().update(dataset);

Solution: Always call update() or upsert() to emit changes.

2. Using V1 Pattern with V2 Entities

Problem:

Dataset dataset = Dataset.builder()...;
// Don't do this - use client.entities() instead
emitter.emit(dataset.toMCPs(), null);  // Wrong!

Solution: Use V2's EntityClient:

client.entities().upsert(dataset);

3. Mixing Operation Modes

Problem:

// Client in SDK mode
DataHubClientV2 client = DataHubClientV2.builder()
    .operationMode(OperationMode.SDK)
    .build();

// But manually setting system description (conflicts with mode)
dataset.setSystemDescription("...");  // Inconsistent!

Solution: Use mode-aware methods or match mode to explicit methods:

dataset.setDescription("...");  // Mode-aware

Benefits After Migration

  • 50-80% less code for common operations
  • Type safety catches errors at compile time
  • Better performance with patches
  • Easier testing with mock entities
  • Better IDE support with autocomplete

Need Help?

Still Using V1 Features?

Some advanced features are still V1-only:

  • KafkaEmitter - Use V1 for Kafka-based emission
  • FileEmitter - Use V1 for file-based emission
  • Custom MCPs - Use V1 for entity types not yet in V2
  • Direct aspect access - Use V1 for fine-grained control

You can use both V1 and V2 in the same application!