# Getting Started with Java SDK V2
This guide walks you through setting up and using the DataHub Java SDK V2 to interact with DataHub's metadata platform.
## Prerequisites
- Java 8 or higher
- Access to a DataHub instance (Cloud or self-hosted)
- (Optional) A DataHub personal access token for authentication
## Installation
Add the DataHub client library to your project's build configuration.
### Gradle
Add to your `build.gradle`:
```gradle
dependencies {
implementation 'io.acryl:datahub-client:__version__'
}
```
### Maven
Add to your `pom.xml`:
```xml
io.acryl
datahub-client
__version__
```
> **Tip:** Find the latest version on [Maven Central](https://mvnrepository.com/artifact/io.acryl/datahub-client).
## Creating a Client
The `DataHubClientV2` is your entry point to all SDK operations. Create one by specifying your DataHub server URL:
```java
import datahub.client.v2.DataHubClientV2;
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build();
```
### With Authentication
For DataHub Cloud or secured instances, provide a personal access token:
```java
DataHubClientV2 client = DataHubClientV2.builder()
.server("https://your-instance.acryl.io")
.token("your-personal-access-token")
.build();
```
> **How to get a token:** In DataHub UI, go to Settings → Access Tokens → Generate Personal Access Token
### Testing the Connection
Verify your client can reach the DataHub server:
```java
try {
boolean connected = client.testConnection();
if (connected) {
System.out.println("Successfully connected to DataHub!");
} else {
System.out.println("Failed to connect to DataHub");
}
} catch (Exception e) {
System.err.println("Connection error: " + e.getMessage());
}
```
## Creating Your First Entity
Let's create a dataset with some metadata.
### Step 1: Import Required Classes
```java
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
```
### Step 2: Build a Dataset
Use the fluent builder to construct a dataset:
```java
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events from web and mobile")
.displayName("User Events")
.build();
```
**Breaking down the builder:**
- `platform` - Data platform identifier (e.g., "snowflake", "bigquery", "postgres")
- `name` - Fully qualified dataset name (database.schema.table or similar)
- `env` - Environment (PROD, DEV, STAGING, etc.)
- `description` - Human-readable description of the dataset
- `displayName` - Friendly name shown in DataHub UI
### Step 3: Add Metadata
Enrich the dataset with tags, owners, and custom properties:
```java
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:john_doe", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention_days", "90")
.addCustomProperty("team", "data-engineering");
```
### Step 4: Upsert to DataHub
Send the dataset to DataHub:
```java
try {
client.entities().upsert(dataset);
System.out.println("Successfully created dataset: " + dataset.getUrn());
} catch (IOException | ExecutionException | InterruptedException e) {
System.err.println("Failed to create dataset: " + e.getMessage());
}
```
## Complete Example
Here's a complete, runnable example:
```java
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Dataset;
import com.linkedin.common.OwnershipType;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
public class DataHubQuickStart {
public static void main(String[] args) {
// Create client
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.token("your-token-here") // Optional
.build();
try {
// Test connection
if (!client.testConnection()) {
System.err.println("Cannot connect to DataHub");
return;
}
// Build dataset
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("analytics.public.user_events")
.env("PROD")
.description("User interaction events")
.displayName("User Events")
.build();
// Add metadata
dataset.addTag("pii")
.addTag("analytics")
.addOwner("urn:li:corpuser:datateam", OwnershipType.TECHNICAL_OWNER)
.addCustomProperty("retention_days", "90");
// Upsert to DataHub
client.entities().upsert(dataset);
System.out.println("Created dataset: " + dataset.getUrn());
} catch (IOException | ExecutionException | InterruptedException e) {
e.printStackTrace();
} finally {
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
```
For more complete examples, see the [Dataset Entity Guide](./dataset-entity.md#examples).
## Reading Entities
Load an existing entity from DataHub:
```java
import com.linkedin.common.urn.DatasetUrn;
DatasetUrn urn = new DatasetUrn(
"snowflake",
"analytics.public.user_events",
"PROD"
);
try {
Dataset loaded = client.entities().get(urn);
if (loaded != null) {
System.out.println("Dataset description: " + loaded.getDescription());
System.out.println("Is read-only: " + loaded.isReadOnly()); // true
}
} catch (IOException | ExecutionException | InterruptedException e) {
e.printStackTrace();
}
```
> **Important:** Entities fetched from the server are **read-only by default**. Additional aspects are lazy-loaded on demand.
### Understanding Read-Only Entities
When you fetch an entity from DataHub, it's immutable to prevent accidental modifications:
```java
Dataset dataset = client.entities().get(urn);
// Reading works fine
String description = dataset.getDescription();
List tags = dataset.getTags();
// But mutation throws ReadOnlyEntityException
// dataset.addTag("pii"); // ERROR: Cannot mutate read-only entity!
```
**Why?** Immutability-by-default makes mutation intent explicit, prevents accidental changes when passing entities between functions, and enables safe entity sharing.
## Updating Entities with Patches
To modify a fetched entity, create a mutable copy first:
```java
// 1. Load existing dataset (read-only)
Dataset dataset = client.entities().get(urn);
// 2. Get mutable copy
Dataset mutable = dataset.mutable();
// 3. Add new tags and owners (patch operations)
mutable.addTag("gdpr")
.addOwner("urn:li:corpuser:new_owner", OwnershipType.TECHNICAL_OWNER);
// 4. Apply patches to DataHub
client.entities().update(mutable);
```
The `update()` method sends only the changes (patches) to DataHub, not the full entity. This is more efficient and safer for concurrent updates.
### Entity Lifecycle
Understanding when entities are mutable vs read-only:
**Builder-created entities** - Mutable from creation:
```java
Dataset dataset = Dataset.builder()
.platform("snowflake")
.name("my_table")
.build();
dataset.isMutable(); // true - can mutate immediately
dataset.addTag("test"); // Works without .mutable()
```
**Server-fetched entities** - Read-only by default:
```java
Dataset dataset = client.entities().get(urn);
dataset.isReadOnly(); // true
// dataset.addTag("test"); // ERROR!
Dataset mutable = dataset.mutable(); // Get writable copy
mutable.addTag("test"); // Now works
```
See the [Patch Operations Guide](./patch-operations.md) for details.
## Upserting vs Updating
SDK V2 provides two methods for persisting entities:
### `upsert(entity)`
- **Use for:** New entities or full replacements
- **Sends:** All aspects from the entity
- **Behavior:** Creates if doesn't exist, replaces if exists
```java
client.entities().upsert(dataset);
```
### `update(entity)`
- **Use for:** Incremental changes to existing entities
- **Sends:** Only pending patches accumulated since the entity was loaded or created
- **Behavior:** Applies surgical updates to specific fields
```java
client.entities().update(dataset);
```
## Working with Other Entities
SDK V2 supports multiple entity types beyond datasets:
### Charts
```java
import datahub.client.v2.entity.Chart;
Chart chart = Chart.builder()
.tool("looker")
.id("my_sales_chart")
.title("Sales Performance by Region")
.description("Monthly sales broken down by geographic region")
.build();
client.entities().upsert(chart);
```
See the [Chart Entity Guide](./chart-entity.md) for details.
### Dashboards
Coming soon! Dashboard entity support is planned for a future release.
## Configuration Options
Customize the client for your environment:
```java
DataHubClientV2 client = DataHubClientV2.builder()
.server("https://your-instance.acryl.io")
.token("your-access-token")
// Configure operation mode
.operationMode(DataHubClientConfigV2.OperationMode.SDK) // or INGESTION
// Customize underlying REST emitter
.restEmitterConfig(config -> config
.timeoutSec(30)
.maxRetries(5)
.retryIntervalSec(2)
)
.build();
```
### Operation Modes
SDK V2 supports two operation modes:
- **SDK Mode** (default): For interactive applications, provides patch-based updates and lazy loading
- **INGESTION Mode**: For ETL pipelines, optimizes for high-throughput batch operations
```java
// SDK mode (default) - interactive use
DataHubClientV2 sdkClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(DataHubClientConfigV2.OperationMode.SDK)
.build();
// Ingestion mode - ETL pipelines
DataHubClientV2 ingestionClient = DataHubClientV2.builder()
.server("http://localhost:8080")
.operationMode(DataHubClientConfigV2.OperationMode.INGESTION)
.build();
```
See [DataHubClientV2 Configuration](./client.md) for all available options.
## Error Handling
Handle errors gracefully:
```java
try {
client.entities().upsert(dataset);
} catch (IOException e) {
// Network or serialization errors
System.err.println("I/O error: " + e.getMessage());
} catch (ExecutionException e) {
// Server-side errors
System.err.println("Server error: " + e.getCause().getMessage());
} catch (InterruptedException e) {
// Operation cancelled
Thread.currentThread().interrupt();
}
```
## Resource Management
Always close the client when done to release resources:
```java
try (DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build()) {
// Use client here
client.entities().upsert(dataset);
} // Client automatically closed
```
Or close explicitly:
```java
try {
// Use client
} finally {
client.close();
}
```
## Next Steps
Now that you've created your first entity, explore more advanced topics:
- **[Design Principles](./design-principles.md)** - Understand the architecture behind SDK V2
- **[Dataset Entity Guide](./dataset-entity.md)** - Comprehensive dataset operations
- **[Chart Entity Guide](./chart-entity.md)** - Working with chart entities
- **[Patch Operations](./patch-operations.md)** - Deep dive into incremental updates
- **[Client Configuration](./client.md)** - Advanced client setup and options
Or check out complete examples in the entity guides:
- [Dataset Examples](./dataset-entity.md#examples)
- [Chart Examples](./chart-entity.md#examples)
- [Dashboard Examples](./dashboard-entity.md#examples)
- [DataJob Examples](./datajob-entity.md#examples)