509 lines
13 KiB
Markdown

# Container Entity
The Container entity represents hierarchical groupings of data assets (databases, schemas, folders, projects). This guide covers container operations in SDK V2.
## Overview
Containers organize data assets into hierarchical structures. Common use cases:
- **Database Hierarchies**: Database → Schema → Table
- **Data Lake Structures**: Bucket → Folder → File
- **Project Hierarchies**: Project → Dataset → Table
Containers use GUID-based URNs generated from their properties (platform, database, schema, etc.), ensuring deterministic URNs for the same logical container.
## URN Construction
Container URNs follow the pattern:
```
urn:li:container:{guid}
```
The GUID is generated by hashing a set of properties (platform, database, schema, env, etc.). This ensures:
- Deterministic URNs: Same properties always generate the same URN
- Uniqueness: Different containers have different URNs
- Hierarchical organization: Parent-child relationships are explicit
**Example:**
```java
Container database = Container.builder()
.platform("snowflake")
.database("analytics_db")
.env("PROD")
.displayName("Analytics Database")
.build();
String urn = database.getContainerUrn();
// urn:li:container:{guid-based-on-properties}
```
## Creating Containers
### Database Container
```java
Container database = Container.builder()
.platform("snowflake")
.database("analytics_db")
.env("PROD")
.displayName("Analytics Database")
.description("Production analytics database")
.qualifiedName("prod.snowflake.analytics_db")
.build();
```
### Schema Container with Parent
```java
Container schema = Container.builder()
.platform("snowflake")
.database("analytics_db")
.schema("public")
.env("PROD")
.displayName("Public Schema")
.qualifiedName("prod.snowflake.analytics_db.public")
.parentContainer(database.getContainerUrn())
.build();
```
### With Custom Properties
```java
Map<String, String> properties = new HashMap<>();
properties.put("size_gb", "2500");
properties.put("table_count", "150");
properties.put("owner_team", "data_platform");
Container database = Container.builder()
.platform("postgres")
.database("production")
.displayName("Production Database")
.customProperties(properties)
.build();
```
### With External URL
```java
Container database = Container.builder()
.platform("bigquery")
.database("analytics")
.displayName("Analytics Database")
.externalUrl("https://console.cloud.google.com/bigquery/project/analytics")
.build();
```
## Hierarchical Relationships
### Parent-Child Structure
Containers support explicit parent-child relationships for organizing data assets hierarchically.
**Database → Schema hierarchy:**
```java
// Level 1: Database
Container database = Container.builder()
.platform("postgres")
.database("production")
.env("PROD")
.displayName("Production Database")
.build();
// Level 2: Schema (child of database)
Container schema = Container.builder()
.platform("postgres")
.database("production")
.schema("public")
.env("PROD")
.displayName("Public Schema")
.parentContainer(database.getContainerUrn())
.build();
```
### Three-Level Hierarchy
**Database → Schema → Table Group:**
```java
// Level 1: Database
Container database = Container.builder()
.platform("snowflake")
.database("analytics")
.displayName("Analytics Database")
.build();
// Level 2: Schema
Container schema = Container.builder()
.platform("snowflake")
.database("analytics")
.schema("public")
.displayName("Public Schema")
.parentContainer(database.getContainerUrn())
.build();
// Level 3: Logical grouping
Container tableGroup = Container.builder()
.platform("snowflake")
.database("analytics")
.schema("public")
.displayName("Customer Tables")
.qualifiedName("analytics.public.customer_group")
.parentContainer(schema.getContainerUrn())
.build();
```
### Managing Parent Relationships
```java
// Set parent container
container.setContainer("urn:li:container:{parent-guid}");
// Get parent container
String parentUrn = container.getParentContainer();
// Clear parent container
container.clearContainer();
```
## Container Operations
### Adding Tags
Categorize containers with tags:
```java
container.addTag("production");
container.addTag("tier1");
container.addTag("pii");
// Or use full URN
container.addTag("urn:li:tag:critical");
```
### Managing Owners
Add owners with different ownership types:
```java
import com.linkedin.common.OwnershipType;
// Add technical owner
container.addOwner("urn:li:corpuser:data_platform_team",
OwnershipType.TECHNICAL_OWNER);
// Add data steward
container.addOwner("urn:li:corpuser:analytics_lead",
OwnershipType.DATA_STEWARD);
// Remove owner
container.removeOwner("urn:li:corpuser:data_platform_team");
```
### Adding Glossary Terms
Associate business glossary terms:
```java
container.addTerm("urn:li:glossaryTerm:ProductionDatabase");
container.addTerm("urn:li:glossaryTerm:CustomerData");
// Remove term
container.removeTerm("urn:li:glossaryTerm:ProductionDatabase");
```
### Setting Domain
Assign container to a domain:
```java
container.setDomain("urn:li:domain:Analytics");
// Clear all domains
container.clearDomains();
```
### Updating Description
Set or update container description:
```java
// Updates editableContainerProperties
container.setDescription("Production database for analytics workloads");
```
## Builder Properties
### Required Properties
- **platform**: Platform name (e.g., "snowflake", "bigquery", "postgres")
- **displayName**: Human-readable name for the container
### Optional Properties
- **database**: Database name (for database/schema containers)
- **schema**: Schema name (for schema containers)
- **env**: Environment (default: "PROD")
- **platformInstance**: Platform instance identifier
- **qualifiedName**: Fully-qualified name (e.g., "prod.snowflake.analytics_db")
- **description**: Container description
- **externalUrl**: External link to the container
- **parentContainer**: Parent container URN
- **customProperties**: Map of custom key-value properties
## Properties Access
### Reading Properties
```java
// Display name
String displayName = container.getDisplayName();
// Qualified name
String qualifiedName = container.getQualifiedName();
// Description
String description = container.getDescription();
// External URL
String externalUrl = container.getExternalUrl();
// Custom properties
Map<String, String> customProps = container.getCustomProperties();
// Parent container
String parentUrn = container.getParentContainer();
```
## Common Patterns
### Data Warehouse Structure
**Snowflake Database and Schema:**
```java
// Database container
Container database = Container.builder()
.platform("snowflake")
.database("analytics")
.env("PROD")
.displayName("Analytics Database")
.description("Primary analytics database")
.build();
database
.addTag("production")
.addTag("analytics")
.addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
.setDomain("urn:li:domain:Analytics");
// Schema container
Container schema = Container.builder()
.platform("snowflake")
.database("analytics")
.schema("public")
.env("PROD")
.displayName("Public Schema")
.description("Main schema for analytics tables")
.parentContainer(database.getContainerUrn())
.build();
schema
.addTag("public")
.addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
.setDomain("urn:li:domain:Analytics");
```
### BigQuery Project and Dataset
```java
// Project container
Container project = Container.builder()
.platform("bigquery")
.database("my-project")
.env("PROD")
.displayName("My GCP Project")
.externalUrl("https://console.cloud.google.com/bigquery/project/my-project")
.build();
// Dataset container
Container dataset = Container.builder()
.platform("bigquery")
.database("my-project")
.schema("analytics")
.env("PROD")
.displayName("Analytics Dataset")
.parentContainer(project.getContainerUrn())
.build();
```
### Data Lake Folder Structure
```java
// Bucket container
Container bucket = Container.builder()
.platform("s3")
.database("my-data-lake")
.env("PROD")
.displayName("Data Lake Bucket")
.build();
// Folder container
Map<String, String> folderProps = new HashMap<>();
folderProps.put("folder_path", "/raw/customer_data");
folderProps.put("file_count", "1500");
Container folder = Container.builder()
.platform("s3")
.database("my-data-lake")
.schema("raw")
.env("PROD")
.displayName("Customer Data Folder")
.parentContainer(bucket.getContainerUrn())
.customProperties(folderProps)
.build();
```
## Fluent API
All mutation operations return the container instance for method chaining:
```java
Container database = Container.builder()
.platform("snowflake")
.database("analytics")
.displayName("Analytics Database")
.build();
database
.addTag("production")
.addTag("tier1")
.addOwner("urn:li:corpuser:data_team", OwnershipType.TECHNICAL_OWNER)
.addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
.addTerm("urn:li:glossaryTerm:ProductionDatabase")
.setDomain("urn:li:domain:Analytics")
.setDescription("Production analytics database");
```
## Upserting to DataHub
```java
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build();
// Create hierarchy
Container database = Container.builder()
.platform("snowflake")
.database("analytics")
.displayName("Analytics Database")
.build();
Container schema = Container.builder()
.platform("snowflake")
.database("analytics")
.schema("public")
.displayName("Public Schema")
.parentContainer(database.getContainerUrn())
.build();
// Upsert in order: parent before children
client.entities().upsert(database);
client.entities().upsert(schema);
```
## Complete Example
```java
import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Container;
import java.util.HashMap;
import java.util.Map;
public class ContainerExample {
public static void main(String[] args) throws Exception {
DataHubClientV2 client = DataHubClientV2.builder()
.server("http://localhost:8080")
.build();
// Create database container
Map<String, String> dbProps = new HashMap<>();
dbProps.put("database_type", "analytics");
dbProps.put("size_gb", "5000");
Container database = Container.builder()
.platform("snowflake")
.database("analytics_db")
.env("PROD")
.displayName("Analytics Database")
.qualifiedName("prod.snowflake.analytics_db")
.description("Production analytics database")
.externalUrl("https://snowflake.example.com/databases/analytics_db")
.customProperties(dbProps)
.build();
database
.addTag("production")
.addTag("analytics")
.addTag("tier1")
.addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
.addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
.addTerm("urn:li:glossaryTerm:ProductionDatabase")
.setDomain("urn:li:domain:Analytics");
// Create schema container
Map<String, String> schemaProps = new HashMap<>();
schemaProps.put("table_count", "150");
schemaProps.put("refresh_schedule", "hourly");
Container schema = Container.builder()
.platform("snowflake")
.database("analytics_db")
.schema("public")
.env("PROD")
.displayName("Public Schema")
.qualifiedName("prod.snowflake.analytics_db.public")
.description("Main schema for analytics tables")
.parentContainer(database.getContainerUrn())
.customProperties(schemaProps)
.build();
schema
.addTag("public")
.addTag("production-ready")
.addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
.setDomain("urn:li:domain:Analytics");
// Upsert to DataHub
client.entities().upsert(database);
client.entities().upsert(schema);
System.out.println("Created container hierarchy:");
System.out.println(" Database: " + database.getContainerUrn());
System.out.println(" Schema: " + schema.getContainerUrn());
client.close();
}
}
```
## Best Practices
1. **Order of Creation**: Always upsert parent containers before their children
2. **Qualified Names**: Use fully-qualified names for clarity (e.g., "prod.snowflake.analytics_db.public")
3. **Custom Properties**: Store additional metadata like size, table count, owner team, etc.
4. **Consistent Environment**: Use consistent env values across related containers
5. **External URLs**: Provide links to containers in source systems for easy navigation
6. **Hierarchical Tags**: Apply both specific and inherited tags (e.g., "production" at database level, "public" at schema level)
## See Also
- [Entities Overview](entities-overview.md)
- [Dataset Entity Guide](dataset-entity.md)
- [Patch Operations](patch-operations.md)
- [Getting Started](getting-started.md)