# Migration Guide: V1 to V2 This guide helps you migrate from Java SDK V1 (RestEmitter) to V2 (DataHubClientV2). We'll show side-by-side examples and highlight key differences. ## Why Migrate? V2 offers significant improvements over V1: - ✅ **Type-safe entity builders** instead of manual MCP construction - ✅ **Automatic URN generation** instead of string manipulation - ✅ **Patch-based updates** for efficient incremental changes - ✅ **Fluent API** with method chaining - ✅ **Lazy loading** with caching - ✅ **Mode-aware operations** (SDK vs INGESTION) ## Key Differences | Aspect | V1 (RestEmitter) | V2 (DataHubClientV2) | | -------------------- | ----------------------- | ---------------------------- | | **Abstraction** | Low-level MCPs | High-level entities | | **URN Construction** | Manual strings | Automatic from builder | | **Updates** | Full aspect replacement | Patch-based incremental | | **Type Safety** | Minimal | Strong compile-time checking | | **API Style** | Imperative emission | Fluent builders | | **Entity Support** | Generic MCPs | Dataset, Chart, Dashboard | ## Migration Examples ### Example 1: Creating a Dataset **V1 (RestEmitter):** ```java import datahub.client.rest.RestEmitter; import datahub.event.MetadataChangeProposalWrapper; import com.linkedin.dataset.DatasetProperties; import com.linkedin.common.urn.DatasetUrn; // Manual URN construction DatasetUrn urn = new DatasetUrn( new DataPlatformUrn("snowflake"), "my_database.my_schema.my_table", FabricType.PROD ); // Manual aspect construction DatasetProperties props = new DatasetProperties(); props.setDescription("My dataset description"); props.setName("My Dataset"); // Manual MCP construction MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder() .entityType("dataset") .entityUrn(urn) .upsert() .aspect(props) .build(); // Create emitter RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080")); // Emit emitter.emit(mcp, null).get(); ``` **V2 (DataHubClientV2):** ```java import datahub.client.v2.DataHubClientV2; import datahub.client.v2.entity.Dataset; // Fluent builder Dataset dataset = Dataset.builder() .platform("snowflake") .name("my_database.my_schema.my_table") .env("PROD") .description("My dataset description") .displayName("My Dataset") .build(); // Create client DataHubClientV2 client = DataHubClientV2.builder() .server("http://localhost:8080") .build(); // Upsert (URN auto-generated, aspect auto-wired) client.entities().upsert(dataset); ``` **Changes:** - ❌ No manual URN construction - ❌ No manual aspect creation - ❌ No MCP wrapper construction - ✅ Fluent builder handles everything - ✅ Type-safe method calls - ✅ Automatic aspect wiring ### Example 2: Adding Tags **V1 (RestEmitter):** ```java import com.linkedin.common.GlobalTags; import com.linkedin.common.TagAssociation; import com.linkedin.common.TagAssociationArray; import com.linkedin.common.urn.TagUrn; // Fetch existing tags or create new GlobalTags tags = fetchExistingTags(urn); // You implement this if (tags == null) { tags = new GlobalTags(); tags.setTags(new TagAssociationArray()); } // Add new tag TagAssociation newTag = new TagAssociation(); newTag.setTag(new TagUrn("pii")); tags.getTags().add(newTag); // Create MCP to replace entire GlobalTags aspect MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder() .entityType("dataset") .entityUrn(urn) .upsert() .aspect(tags) .build(); emitter.emit(mcp, null).get(); ``` **V2 (DataHubClientV2):** ```java // Just add the tag - patch handles everything dataset.addTag("pii"); client.entities().update(dataset); ``` **Changes:** - ❌ No fetching existing tags - ❌ No manual aspect manipulation - ❌ No MCP construction - ✅ Single method call - ✅ Patch-based (doesn't overwrite other tags) - ✅ Automatic URN handling ### Example 3: Adding Owners **V1 (RestEmitter):** ```java import com.linkedin.common.Ownership; import com.linkedin.common.Owner; import com.linkedin.common.OwnerArray; import com.linkedin.common.OwnershipType; import com.linkedin.common.urn.Urn; // Fetch existing owners or create new Ownership ownership = fetchExistingOwnership(urn); if (ownership == null) { ownership = new Ownership(); ownership.setOwners(new OwnerArray()); } // Add new owner Owner newOwner = new Owner(); newOwner.setOwner(Urn.createFromString("urn:li:corpuser:john_doe")); newOwner.setType(OwnershipType.TECHNICAL_OWNER); ownership.getOwners().add(newOwner); // Create MCP MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder() .entityType("dataset") .entityUrn(urn) .upsert() .aspect(ownership) .build(); emitter.emit(mcp, null).get(); ``` **V2 (DataHubClientV2):** ```java dataset.addOwner("urn:li:corpuser:john_doe", OwnershipType.TECHNICAL_OWNER); client.entities().update(dataset); ``` **Changes:** - ❌ No fetching existing owners - ❌ No manual Owner object creation - ❌ No array manipulation - ✅ Single method with parameters - ✅ Type-safe ownership type enum - ✅ Automatic patch creation ### Example 4: Multiple Metadata Additions **V1 (RestEmitter):** ```java // Create dataset properties DatasetProperties props = new DatasetProperties(); props.setDescription("My description"); // Create tags GlobalTags tags = new GlobalTags(); TagAssociationArray tagArray = new TagAssociationArray(); tagArray.add(createTagAssociation("pii")); tagArray.add(createTagAssociation("sensitive")); tags.setTags(tagArray); // Create ownership Ownership ownership = new Ownership(); OwnerArray ownerArray = new OwnerArray(); ownerArray.add(createOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER)); ownership.setOwners(ownerArray); // Create 3 separate MCPs and emit each emitter.emit(createMCP(urn, props), null).get(); emitter.emit(createMCP(urn, tags), null).get(); emitter.emit(createMCP(urn, ownership), null).get(); ``` **V2 (DataHubClientV2):** ```java Dataset dataset = Dataset.builder() .platform("snowflake") .name("my_table") .description("My description") .build(); dataset.addTag("pii") .addTag("sensitive") .addOwner("urn:li:corpuser:john", OwnershipType.TECHNICAL_OWNER); client.entities().upsert(dataset); // Single call, all metadata included ``` **Changes:** - ❌ No creating multiple aspects separately - ❌ No multiple emission calls - ✅ Method chaining for fluent API - ✅ Single upsert emits everything - ✅ Atomic operation ### Example 5: Updating Existing Entity **V1 (RestEmitter):** ```java // 1. Fetch current state from DataHub DatasetProperties existingProps = fetchAspect(urn, DatasetProperties.class); // 2. Modify existingProps.setDescription("Updated description"); // 3. Send back (overwrites entire aspect) MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder() .entityType("dataset") .entityUrn(urn) .upsert() .aspect(existingProps) .build(); emitter.emit(mcp, null).get(); ``` **V2 (DataHubClientV2):** ```java // Just modify - patch handles incremental update Dataset dataset = client.entities().get(urn); // Optional: load existing dataset.setDescription("Updated description"); client.entities().update(dataset); // Patch only changes description ``` **Changes:** - ✅ Can update without fetching (for patches) - ✅ Patch-based incremental update - ✅ No risk of overwriting other fields - ✅ More efficient payload ## Migration Checklist ### 1. Update Dependencies Keep existing dependency (backwards compatible): ```gradle dependencies { implementation 'io.acryl:datahub-client:__version__' } ``` ### 2. Change Imports **Replace:** ```java import datahub.client.rest.RestEmitter; import datahub.event.MetadataChangeProposalWrapper; ``` **With:** ```java import datahub.client.v2.DataHubClientV2; import datahub.client.v2.entity.Dataset; import datahub.client.v2.entity.Chart; ``` ### 3. Replace RestEmitter with DataHubClientV2 **Before:** ```java RestEmitter emitter = RestEmitter.create(b -> b .server("http://localhost:8080") .token("my-token") ); ``` **After:** ```java DataHubClientV2 client = DataHubClientV2.builder() .server("http://localhost:8080") .token("my-token") .build(); ``` ### 4. Use Entity Builders Replace manual MCP/URN construction with entity builders: **Before:** ```java DatasetUrn urn = new DatasetUrn(...); DatasetProperties props = new DatasetProperties(); props.setDescription("..."); MetadataChangeProposalWrapper mcp = MetadataChangeProposalWrapper.builder()... emitter.emit(mcp, null).get(); ``` **After:** ```java Dataset dataset = Dataset.builder() .platform("...") .name("...") .description("...") .build(); client.entities().upsert(dataset); ``` ### 5. Use Patch Operations for Updates Replace fetch-modify-send with patches: **Before:** ```java GlobalTags tags = fetch(...); tags.getTags().add(...); emit(tags); ``` **After:** ```java dataset.addTag("..."); client.entities().update(dataset); ``` ## Gradual Migration Strategy You can migrate incrementally - V1 and V2 can coexist: ```java // V1 emitter (for unsupported operations) RestEmitter emitter = RestEmitter.create(b -> b.server("...")); // V2 client (for entities) DataHubClientV2 client = DataHubClientV2.builder() .server("...") .build(); // Use V2 for supported entities Dataset dataset = Dataset.builder()... client.entities().upsert(dataset); // Fall back to V1 for custom MCPs MetadataChangeProposalWrapper customMcp = ...; emitter.emit(customMcp, null).get(); ``` ## Common Pitfalls ### 1. Forgetting to Call `update()` or `upsert()` **Problem:** ```java dataset.addTag("pii"); // Patch created but not emitted! // Missing: client.entities().update(dataset); ``` **Solution:** Always call `update()` or `upsert()` to emit changes. ### 2. Using V1 Pattern with V2 Entities **Problem:** ```java Dataset dataset = Dataset.builder()...; // Don't do this - use client.entities() instead emitter.emit(dataset.toMCPs(), null); // Wrong! ``` **Solution:** Use V2's EntityClient: ```java client.entities().upsert(dataset); ``` ### 3. Mixing Operation Modes **Problem:** ```java // Client in SDK mode DataHubClientV2 client = DataHubClientV2.builder() .operationMode(OperationMode.SDK) .build(); // But manually setting system description (conflicts with mode) dataset.setSystemDescription("..."); // Inconsistent! ``` **Solution:** Use mode-aware methods or match mode to explicit methods: ```java dataset.setDescription("..."); // Mode-aware ``` ## Benefits After Migration - **50-80% less code** for common operations - **Type safety** catches errors at compile time - **Better performance** with patches - **Easier testing** with mock entities - **Better IDE support** with autocomplete ## Need Help? - **V2 Documentation**: [Getting Started Guide](./getting-started.md) - **Entity Guides**: [Dataset](./dataset-entity.md), [Chart](./chart-entity.md) - **Examples**: [V2 Examples Directory](../../examples/src/main/java/io/datahubproject/examples/v2/) - **V1 Documentation**: [Java SDK V1](../../as-a-library.md) (for reference) ## Still Using V1 Features? Some advanced features are still V1-only: - **KafkaEmitter** - Use V1 for Kafka-based emission - **FileEmitter** - Use V1 for file-based emission - **Custom MCPs** - Use V1 for entity types not yet in V2 - **Direct aspect access** - Use V1 for fine-grained control You can use both V1 and V2 in the same application!