# Container Entity The Container entity represents hierarchical groupings of data assets (databases, schemas, folders, projects). This guide covers container operations in SDK V2. ## Overview Containers organize data assets into hierarchical structures. Common use cases: - **Database Hierarchies**: Database → Schema → Table - **Data Lake Structures**: Bucket → Folder → File - **Project Hierarchies**: Project → Dataset → Table Containers use GUID-based URNs generated from their properties (platform, database, schema, etc.), ensuring deterministic URNs for the same logical container. ## URN Construction Container URNs follow the pattern: ``` urn:li:container:{guid} ``` The GUID is generated by hashing a set of properties (platform, database, schema, env, etc.). This ensures: - Deterministic URNs: Same properties always generate the same URN - Uniqueness: Different containers have different URNs - Hierarchical organization: Parent-child relationships are explicit **Example:** ```java Container database = Container.builder() .platform("snowflake") .database("analytics_db") .env("PROD") .displayName("Analytics Database") .build(); String urn = database.getContainerUrn(); // urn:li:container:{guid-based-on-properties} ``` ## Creating Containers ### Database Container ```java Container database = Container.builder() .platform("snowflake") .database("analytics_db") .env("PROD") .displayName("Analytics Database") .description("Production analytics database") .qualifiedName("prod.snowflake.analytics_db") .build(); ``` ### Schema Container with Parent ```java Container schema = Container.builder() .platform("snowflake") .database("analytics_db") .schema("public") .env("PROD") .displayName("Public Schema") .qualifiedName("prod.snowflake.analytics_db.public") .parentContainer(database.getContainerUrn()) .build(); ``` ### With Custom Properties ```java Map properties = new HashMap<>(); properties.put("size_gb", "2500"); properties.put("table_count", "150"); properties.put("owner_team", "data_platform"); Container database = Container.builder() .platform("postgres") .database("production") .displayName("Production Database") .customProperties(properties) .build(); ``` ### With External URL ```java Container database = Container.builder() .platform("bigquery") .database("analytics") .displayName("Analytics Database") .externalUrl("https://console.cloud.google.com/bigquery/project/analytics") .build(); ``` ## Hierarchical Relationships ### Parent-Child Structure Containers support explicit parent-child relationships for organizing data assets hierarchically. **Database → Schema hierarchy:** ```java // Level 1: Database Container database = Container.builder() .platform("postgres") .database("production") .env("PROD") .displayName("Production Database") .build(); // Level 2: Schema (child of database) Container schema = Container.builder() .platform("postgres") .database("production") .schema("public") .env("PROD") .displayName("Public Schema") .parentContainer(database.getContainerUrn()) .build(); ``` ### Three-Level Hierarchy **Database → Schema → Table Group:** ```java // Level 1: Database Container database = Container.builder() .platform("snowflake") .database("analytics") .displayName("Analytics Database") .build(); // Level 2: Schema Container schema = Container.builder() .platform("snowflake") .database("analytics") .schema("public") .displayName("Public Schema") .parentContainer(database.getContainerUrn()) .build(); // Level 3: Logical grouping Container tableGroup = Container.builder() .platform("snowflake") .database("analytics") .schema("public") .displayName("Customer Tables") .qualifiedName("analytics.public.customer_group") .parentContainer(schema.getContainerUrn()) .build(); ``` ### Managing Parent Relationships ```java // Set parent container container.setContainer("urn:li:container:{parent-guid}"); // Get parent container String parentUrn = container.getParentContainer(); // Clear parent container container.clearContainer(); ``` ## Container Operations ### Adding Tags Categorize containers with tags: ```java container.addTag("production"); container.addTag("tier1"); container.addTag("pii"); // Or use full URN container.addTag("urn:li:tag:critical"); ``` ### Managing Owners Add owners with different ownership types: ```java import com.linkedin.common.OwnershipType; // Add technical owner container.addOwner("urn:li:corpuser:data_platform_team", OwnershipType.TECHNICAL_OWNER); // Add data steward container.addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD); // Remove owner container.removeOwner("urn:li:corpuser:data_platform_team"); ``` ### Adding Glossary Terms Associate business glossary terms: ```java container.addTerm("urn:li:glossaryTerm:ProductionDatabase"); container.addTerm("urn:li:glossaryTerm:CustomerData"); // Remove term container.removeTerm("urn:li:glossaryTerm:ProductionDatabase"); ``` ### Setting Domain Assign container to a domain: ```java container.setDomain("urn:li:domain:Analytics"); // Clear all domains container.clearDomains(); ``` ### Updating Description Set or update container description: ```java // Updates editableContainerProperties container.setDescription("Production database for analytics workloads"); ``` ## Builder Properties ### Required Properties - **platform**: Platform name (e.g., "snowflake", "bigquery", "postgres") - **displayName**: Human-readable name for the container ### Optional Properties - **database**: Database name (for database/schema containers) - **schema**: Schema name (for schema containers) - **env**: Environment (default: "PROD") - **platformInstance**: Platform instance identifier - **qualifiedName**: Fully-qualified name (e.g., "prod.snowflake.analytics_db") - **description**: Container description - **externalUrl**: External link to the container - **parentContainer**: Parent container URN - **customProperties**: Map of custom key-value properties ## Properties Access ### Reading Properties ```java // Display name String displayName = container.getDisplayName(); // Qualified name String qualifiedName = container.getQualifiedName(); // Description String description = container.getDescription(); // External URL String externalUrl = container.getExternalUrl(); // Custom properties Map customProps = container.getCustomProperties(); // Parent container String parentUrn = container.getParentContainer(); ``` ## Common Patterns ### Data Warehouse Structure **Snowflake Database and Schema:** ```java // Database container Container database = Container.builder() .platform("snowflake") .database("analytics") .env("PROD") .displayName("Analytics Database") .description("Primary analytics database") .build(); database .addTag("production") .addTag("analytics") .addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER) .setDomain("urn:li:domain:Analytics"); // Schema container Container schema = Container.builder() .platform("snowflake") .database("analytics") .schema("public") .env("PROD") .displayName("Public Schema") .description("Main schema for analytics tables") .parentContainer(database.getContainerUrn()) .build(); schema .addTag("public") .addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER) .setDomain("urn:li:domain:Analytics"); ``` ### BigQuery Project and Dataset ```java // Project container Container project = Container.builder() .platform("bigquery") .database("my-project") .env("PROD") .displayName("My GCP Project") .externalUrl("https://console.cloud.google.com/bigquery/project/my-project") .build(); // Dataset container Container dataset = Container.builder() .platform("bigquery") .database("my-project") .schema("analytics") .env("PROD") .displayName("Analytics Dataset") .parentContainer(project.getContainerUrn()) .build(); ``` ### Data Lake Folder Structure ```java // Bucket container Container bucket = Container.builder() .platform("s3") .database("my-data-lake") .env("PROD") .displayName("Data Lake Bucket") .build(); // Folder container Map folderProps = new HashMap<>(); folderProps.put("folder_path", "/raw/customer_data"); folderProps.put("file_count", "1500"); Container folder = Container.builder() .platform("s3") .database("my-data-lake") .schema("raw") .env("PROD") .displayName("Customer Data Folder") .parentContainer(bucket.getContainerUrn()) .customProperties(folderProps) .build(); ``` ## Fluent API All mutation operations return the container instance for method chaining: ```java Container database = Container.builder() .platform("snowflake") .database("analytics") .displayName("Analytics Database") .build(); database .addTag("production") .addTag("tier1") .addOwner("urn:li:corpuser:data_team", OwnershipType.TECHNICAL_OWNER) .addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD) .addTerm("urn:li:glossaryTerm:ProductionDatabase") .setDomain("urn:li:domain:Analytics") .setDescription("Production analytics database"); ``` ## Upserting to DataHub ```java DataHubClientV2 client = DataHubClientV2.builder() .server("http://localhost:8080") .build(); // Create hierarchy Container database = Container.builder() .platform("snowflake") .database("analytics") .displayName("Analytics Database") .build(); Container schema = Container.builder() .platform("snowflake") .database("analytics") .schema("public") .displayName("Public Schema") .parentContainer(database.getContainerUrn()) .build(); // Upsert in order: parent before children client.entities().upsert(database); client.entities().upsert(schema); ``` ## Complete Example ```java import com.linkedin.common.OwnershipType; import datahub.client.v2.DataHubClientV2; import datahub.client.v2.entity.Container; import java.util.HashMap; import java.util.Map; public class ContainerExample { public static void main(String[] args) throws Exception { DataHubClientV2 client = DataHubClientV2.builder() .server("http://localhost:8080") .build(); // Create database container Map dbProps = new HashMap<>(); dbProps.put("database_type", "analytics"); dbProps.put("size_gb", "5000"); Container database = Container.builder() .platform("snowflake") .database("analytics_db") .env("PROD") .displayName("Analytics Database") .qualifiedName("prod.snowflake.analytics_db") .description("Production analytics database") .externalUrl("https://snowflake.example.com/databases/analytics_db") .customProperties(dbProps) .build(); database .addTag("production") .addTag("analytics") .addTag("tier1") .addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER) .addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD) .addTerm("urn:li:glossaryTerm:ProductionDatabase") .setDomain("urn:li:domain:Analytics"); // Create schema container Map schemaProps = new HashMap<>(); schemaProps.put("table_count", "150"); schemaProps.put("refresh_schedule", "hourly"); Container schema = Container.builder() .platform("snowflake") .database("analytics_db") .schema("public") .env("PROD") .displayName("Public Schema") .qualifiedName("prod.snowflake.analytics_db.public") .description("Main schema for analytics tables") .parentContainer(database.getContainerUrn()) .customProperties(schemaProps) .build(); schema .addTag("public") .addTag("production-ready") .addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER) .setDomain("urn:li:domain:Analytics"); // Upsert to DataHub client.entities().upsert(database); client.entities().upsert(schema); System.out.println("Created container hierarchy:"); System.out.println(" Database: " + database.getContainerUrn()); System.out.println(" Schema: " + schema.getContainerUrn()); client.close(); } } ``` ## Best Practices 1. **Order of Creation**: Always upsert parent containers before their children 2. **Qualified Names**: Use fully-qualified names for clarity (e.g., "prod.snowflake.analytics_db.public") 3. **Custom Properties**: Store additional metadata like size, table count, owner team, etc. 4. **Consistent Environment**: Use consistent env values across related containers 5. **External URLs**: Provide links to containers in source systems for easy navigation 6. **Hierarchical Tags**: Apply both specific and inherited tags (e.g., "production" at database level, "public" at schema level) ## See Also - [Entities Overview](entities-overview.md) - [Dataset Entity Guide](dataset-entity.md) - [Patch Operations](patch-operations.md) - [Getting Started](getting-started.md)