mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-19 14:08:38 +00:00
509 lines
13 KiB
Markdown
509 lines
13 KiB
Markdown
# Container Entity
|
|
|
|
The Container entity represents hierarchical groupings of data assets (databases, schemas, folders, projects). This guide covers container operations in SDK V2.
|
|
|
|
## Overview
|
|
|
|
Containers organize data assets into hierarchical structures. Common use cases:
|
|
|
|
- **Database Hierarchies**: Database → Schema → Table
|
|
- **Data Lake Structures**: Bucket → Folder → File
|
|
- **Project Hierarchies**: Project → Dataset → Table
|
|
|
|
Containers use GUID-based URNs generated from their properties (platform, database, schema, etc.), ensuring deterministic URNs for the same logical container.
|
|
|
|
## URN Construction
|
|
|
|
Container URNs follow the pattern:
|
|
|
|
```
|
|
urn:li:container:{guid}
|
|
```
|
|
|
|
The GUID is generated by hashing a set of properties (platform, database, schema, env, etc.). This ensures:
|
|
|
|
- Deterministic URNs: Same properties always generate the same URN
|
|
- Uniqueness: Different containers have different URNs
|
|
- Hierarchical organization: Parent-child relationships are explicit
|
|
|
|
**Example:**
|
|
|
|
```java
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics_db")
|
|
.env("PROD")
|
|
.displayName("Analytics Database")
|
|
.build();
|
|
|
|
String urn = database.getContainerUrn();
|
|
// urn:li:container:{guid-based-on-properties}
|
|
```
|
|
|
|
## Creating Containers
|
|
|
|
### Database Container
|
|
|
|
```java
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics_db")
|
|
.env("PROD")
|
|
.displayName("Analytics Database")
|
|
.description("Production analytics database")
|
|
.qualifiedName("prod.snowflake.analytics_db")
|
|
.build();
|
|
```
|
|
|
|
### Schema Container with Parent
|
|
|
|
```java
|
|
Container schema = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics_db")
|
|
.schema("public")
|
|
.env("PROD")
|
|
.displayName("Public Schema")
|
|
.qualifiedName("prod.snowflake.analytics_db.public")
|
|
.parentContainer(database.getContainerUrn())
|
|
.build();
|
|
```
|
|
|
|
### With Custom Properties
|
|
|
|
```java
|
|
Map<String, String> properties = new HashMap<>();
|
|
properties.put("size_gb", "2500");
|
|
properties.put("table_count", "150");
|
|
properties.put("owner_team", "data_platform");
|
|
|
|
Container database = Container.builder()
|
|
.platform("postgres")
|
|
.database("production")
|
|
.displayName("Production Database")
|
|
.customProperties(properties)
|
|
.build();
|
|
```
|
|
|
|
### With External URL
|
|
|
|
```java
|
|
Container database = Container.builder()
|
|
.platform("bigquery")
|
|
.database("analytics")
|
|
.displayName("Analytics Database")
|
|
.externalUrl("https://console.cloud.google.com/bigquery/project/analytics")
|
|
.build();
|
|
```
|
|
|
|
## Hierarchical Relationships
|
|
|
|
### Parent-Child Structure
|
|
|
|
Containers support explicit parent-child relationships for organizing data assets hierarchically.
|
|
|
|
**Database → Schema hierarchy:**
|
|
|
|
```java
|
|
// Level 1: Database
|
|
Container database = Container.builder()
|
|
.platform("postgres")
|
|
.database("production")
|
|
.env("PROD")
|
|
.displayName("Production Database")
|
|
.build();
|
|
|
|
// Level 2: Schema (child of database)
|
|
Container schema = Container.builder()
|
|
.platform("postgres")
|
|
.database("production")
|
|
.schema("public")
|
|
.env("PROD")
|
|
.displayName("Public Schema")
|
|
.parentContainer(database.getContainerUrn())
|
|
.build();
|
|
```
|
|
|
|
### Three-Level Hierarchy
|
|
|
|
**Database → Schema → Table Group:**
|
|
|
|
```java
|
|
// Level 1: Database
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.displayName("Analytics Database")
|
|
.build();
|
|
|
|
// Level 2: Schema
|
|
Container schema = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.schema("public")
|
|
.displayName("Public Schema")
|
|
.parentContainer(database.getContainerUrn())
|
|
.build();
|
|
|
|
// Level 3: Logical grouping
|
|
Container tableGroup = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.schema("public")
|
|
.displayName("Customer Tables")
|
|
.qualifiedName("analytics.public.customer_group")
|
|
.parentContainer(schema.getContainerUrn())
|
|
.build();
|
|
```
|
|
|
|
### Managing Parent Relationships
|
|
|
|
```java
|
|
// Set parent container
|
|
container.setContainer("urn:li:container:{parent-guid}");
|
|
|
|
// Get parent container
|
|
String parentUrn = container.getParentContainer();
|
|
|
|
// Clear parent container
|
|
container.clearContainer();
|
|
```
|
|
|
|
## Container Operations
|
|
|
|
### Adding Tags
|
|
|
|
Categorize containers with tags:
|
|
|
|
```java
|
|
container.addTag("production");
|
|
container.addTag("tier1");
|
|
container.addTag("pii");
|
|
|
|
// Or use full URN
|
|
container.addTag("urn:li:tag:critical");
|
|
```
|
|
|
|
### Managing Owners
|
|
|
|
Add owners with different ownership types:
|
|
|
|
```java
|
|
import com.linkedin.common.OwnershipType;
|
|
|
|
// Add technical owner
|
|
container.addOwner("urn:li:corpuser:data_platform_team",
|
|
OwnershipType.TECHNICAL_OWNER);
|
|
|
|
// Add data steward
|
|
container.addOwner("urn:li:corpuser:analytics_lead",
|
|
OwnershipType.DATA_STEWARD);
|
|
|
|
// Remove owner
|
|
container.removeOwner("urn:li:corpuser:data_platform_team");
|
|
```
|
|
|
|
### Adding Glossary Terms
|
|
|
|
Associate business glossary terms:
|
|
|
|
```java
|
|
container.addTerm("urn:li:glossaryTerm:ProductionDatabase");
|
|
container.addTerm("urn:li:glossaryTerm:CustomerData");
|
|
|
|
// Remove term
|
|
container.removeTerm("urn:li:glossaryTerm:ProductionDatabase");
|
|
```
|
|
|
|
### Setting Domain
|
|
|
|
Assign container to a domain:
|
|
|
|
```java
|
|
container.setDomain("urn:li:domain:Analytics");
|
|
|
|
// Clear all domains
|
|
container.clearDomains();
|
|
```
|
|
|
|
### Updating Description
|
|
|
|
Set or update container description:
|
|
|
|
```java
|
|
// Updates editableContainerProperties
|
|
container.setDescription("Production database for analytics workloads");
|
|
```
|
|
|
|
## Builder Properties
|
|
|
|
### Required Properties
|
|
|
|
- **platform**: Platform name (e.g., "snowflake", "bigquery", "postgres")
|
|
- **displayName**: Human-readable name for the container
|
|
|
|
### Optional Properties
|
|
|
|
- **database**: Database name (for database/schema containers)
|
|
- **schema**: Schema name (for schema containers)
|
|
- **env**: Environment (default: "PROD")
|
|
- **platformInstance**: Platform instance identifier
|
|
- **qualifiedName**: Fully-qualified name (e.g., "prod.snowflake.analytics_db")
|
|
- **description**: Container description
|
|
- **externalUrl**: External link to the container
|
|
- **parentContainer**: Parent container URN
|
|
- **customProperties**: Map of custom key-value properties
|
|
|
|
## Properties Access
|
|
|
|
### Reading Properties
|
|
|
|
```java
|
|
// Display name
|
|
String displayName = container.getDisplayName();
|
|
|
|
// Qualified name
|
|
String qualifiedName = container.getQualifiedName();
|
|
|
|
// Description
|
|
String description = container.getDescription();
|
|
|
|
// External URL
|
|
String externalUrl = container.getExternalUrl();
|
|
|
|
// Custom properties
|
|
Map<String, String> customProps = container.getCustomProperties();
|
|
|
|
// Parent container
|
|
String parentUrn = container.getParentContainer();
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Data Warehouse Structure
|
|
|
|
**Snowflake Database and Schema:**
|
|
|
|
```java
|
|
// Database container
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.env("PROD")
|
|
.displayName("Analytics Database")
|
|
.description("Primary analytics database")
|
|
.build();
|
|
|
|
database
|
|
.addTag("production")
|
|
.addTag("analytics")
|
|
.addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
|
|
.setDomain("urn:li:domain:Analytics");
|
|
|
|
// Schema container
|
|
Container schema = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.schema("public")
|
|
.env("PROD")
|
|
.displayName("Public Schema")
|
|
.description("Main schema for analytics tables")
|
|
.parentContainer(database.getContainerUrn())
|
|
.build();
|
|
|
|
schema
|
|
.addTag("public")
|
|
.addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
|
|
.setDomain("urn:li:domain:Analytics");
|
|
```
|
|
|
|
### BigQuery Project and Dataset
|
|
|
|
```java
|
|
// Project container
|
|
Container project = Container.builder()
|
|
.platform("bigquery")
|
|
.database("my-project")
|
|
.env("PROD")
|
|
.displayName("My GCP Project")
|
|
.externalUrl("https://console.cloud.google.com/bigquery/project/my-project")
|
|
.build();
|
|
|
|
// Dataset container
|
|
Container dataset = Container.builder()
|
|
.platform("bigquery")
|
|
.database("my-project")
|
|
.schema("analytics")
|
|
.env("PROD")
|
|
.displayName("Analytics Dataset")
|
|
.parentContainer(project.getContainerUrn())
|
|
.build();
|
|
```
|
|
|
|
### Data Lake Folder Structure
|
|
|
|
```java
|
|
// Bucket container
|
|
Container bucket = Container.builder()
|
|
.platform("s3")
|
|
.database("my-data-lake")
|
|
.env("PROD")
|
|
.displayName("Data Lake Bucket")
|
|
.build();
|
|
|
|
// Folder container
|
|
Map<String, String> folderProps = new HashMap<>();
|
|
folderProps.put("folder_path", "/raw/customer_data");
|
|
folderProps.put("file_count", "1500");
|
|
|
|
Container folder = Container.builder()
|
|
.platform("s3")
|
|
.database("my-data-lake")
|
|
.schema("raw")
|
|
.env("PROD")
|
|
.displayName("Customer Data Folder")
|
|
.parentContainer(bucket.getContainerUrn())
|
|
.customProperties(folderProps)
|
|
.build();
|
|
```
|
|
|
|
## Fluent API
|
|
|
|
All mutation operations return the container instance for method chaining:
|
|
|
|
```java
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.displayName("Analytics Database")
|
|
.build();
|
|
|
|
database
|
|
.addTag("production")
|
|
.addTag("tier1")
|
|
.addOwner("urn:li:corpuser:data_team", OwnershipType.TECHNICAL_OWNER)
|
|
.addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
|
|
.addTerm("urn:li:glossaryTerm:ProductionDatabase")
|
|
.setDomain("urn:li:domain:Analytics")
|
|
.setDescription("Production analytics database");
|
|
```
|
|
|
|
## Upserting to DataHub
|
|
|
|
```java
|
|
DataHubClientV2 client = DataHubClientV2.builder()
|
|
.server("http://localhost:8080")
|
|
.build();
|
|
|
|
// Create hierarchy
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.displayName("Analytics Database")
|
|
.build();
|
|
|
|
Container schema = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics")
|
|
.schema("public")
|
|
.displayName("Public Schema")
|
|
.parentContainer(database.getContainerUrn())
|
|
.build();
|
|
|
|
// Upsert in order: parent before children
|
|
client.entities().upsert(database);
|
|
client.entities().upsert(schema);
|
|
```
|
|
|
|
## Complete Example
|
|
|
|
```java
|
|
import com.linkedin.common.OwnershipType;
|
|
import datahub.client.v2.DataHubClientV2;
|
|
import datahub.client.v2.entity.Container;
|
|
import java.util.HashMap;
|
|
import java.util.Map;
|
|
|
|
public class ContainerExample {
|
|
public static void main(String[] args) throws Exception {
|
|
DataHubClientV2 client = DataHubClientV2.builder()
|
|
.server("http://localhost:8080")
|
|
.build();
|
|
|
|
// Create database container
|
|
Map<String, String> dbProps = new HashMap<>();
|
|
dbProps.put("database_type", "analytics");
|
|
dbProps.put("size_gb", "5000");
|
|
|
|
Container database = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics_db")
|
|
.env("PROD")
|
|
.displayName("Analytics Database")
|
|
.qualifiedName("prod.snowflake.analytics_db")
|
|
.description("Production analytics database")
|
|
.externalUrl("https://snowflake.example.com/databases/analytics_db")
|
|
.customProperties(dbProps)
|
|
.build();
|
|
|
|
database
|
|
.addTag("production")
|
|
.addTag("analytics")
|
|
.addTag("tier1")
|
|
.addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
|
|
.addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
|
|
.addTerm("urn:li:glossaryTerm:ProductionDatabase")
|
|
.setDomain("urn:li:domain:Analytics");
|
|
|
|
// Create schema container
|
|
Map<String, String> schemaProps = new HashMap<>();
|
|
schemaProps.put("table_count", "150");
|
|
schemaProps.put("refresh_schedule", "hourly");
|
|
|
|
Container schema = Container.builder()
|
|
.platform("snowflake")
|
|
.database("analytics_db")
|
|
.schema("public")
|
|
.env("PROD")
|
|
.displayName("Public Schema")
|
|
.qualifiedName("prod.snowflake.analytics_db.public")
|
|
.description("Main schema for analytics tables")
|
|
.parentContainer(database.getContainerUrn())
|
|
.customProperties(schemaProps)
|
|
.build();
|
|
|
|
schema
|
|
.addTag("public")
|
|
.addTag("production-ready")
|
|
.addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
|
|
.setDomain("urn:li:domain:Analytics");
|
|
|
|
// Upsert to DataHub
|
|
client.entities().upsert(database);
|
|
client.entities().upsert(schema);
|
|
|
|
System.out.println("Created container hierarchy:");
|
|
System.out.println(" Database: " + database.getContainerUrn());
|
|
System.out.println(" Schema: " + schema.getContainerUrn());
|
|
|
|
client.close();
|
|
}
|
|
}
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Order of Creation**: Always upsert parent containers before their children
|
|
2. **Qualified Names**: Use fully-qualified names for clarity (e.g., "prod.snowflake.analytics_db.public")
|
|
3. **Custom Properties**: Store additional metadata like size, table count, owner team, etc.
|
|
4. **Consistent Environment**: Use consistent env values across related containers
|
|
5. **External URLs**: Provide links to containers in source systems for easy navigation
|
|
6. **Hierarchical Tags**: Apply both specific and inherited tags (e.g., "production" at database level, "public" at schema level)
|
|
|
|
## See Also
|
|
|
|
- [Entities Overview](entities-overview.md)
|
|
- [Dataset Entity Guide](dataset-entity.md)
|
|
- [Patch Operations](patch-operations.md)
|
|
- [Getting Started](getting-started.md)
|