13 KiB

Container Entity

The Container entity represents hierarchical groupings of data assets (databases, schemas, folders, projects). This guide covers container operations in SDK V2.

Overview

Containers organize data assets into hierarchical structures. Common use cases:

  • Database Hierarchies: Database → Schema → Table
  • Data Lake Structures: Bucket → Folder → File
  • Project Hierarchies: Project → Dataset → Table

Containers use GUID-based URNs generated from their properties (platform, database, schema, etc.), ensuring deterministic URNs for the same logical container.

URN Construction

Container URNs follow the pattern:

urn:li:container:{guid}

The GUID is generated by hashing a set of properties (platform, database, schema, env, etc.). This ensures:

  • Deterministic URNs: Same properties always generate the same URN
  • Uniqueness: Different containers have different URNs
  • Hierarchical organization: Parent-child relationships are explicit

Example:

Container database = Container.builder()
    .platform("snowflake")
    .database("analytics_db")
    .env("PROD")
    .displayName("Analytics Database")
    .build();

String urn = database.getContainerUrn();
// urn:li:container:{guid-based-on-properties}

Creating Containers

Database Container

Container database = Container.builder()
    .platform("snowflake")
    .database("analytics_db")
    .env("PROD")
    .displayName("Analytics Database")
    .description("Production analytics database")
    .qualifiedName("prod.snowflake.analytics_db")
    .build();

Schema Container with Parent

Container schema = Container.builder()
    .platform("snowflake")
    .database("analytics_db")
    .schema("public")
    .env("PROD")
    .displayName("Public Schema")
    .qualifiedName("prod.snowflake.analytics_db.public")
    .parentContainer(database.getContainerUrn())
    .build();

With Custom Properties

Map<String, String> properties = new HashMap<>();
properties.put("size_gb", "2500");
properties.put("table_count", "150");
properties.put("owner_team", "data_platform");

Container database = Container.builder()
    .platform("postgres")
    .database("production")
    .displayName("Production Database")
    .customProperties(properties)
    .build();

With External URL

Container database = Container.builder()
    .platform("bigquery")
    .database("analytics")
    .displayName("Analytics Database")
    .externalUrl("https://console.cloud.google.com/bigquery/project/analytics")
    .build();

Hierarchical Relationships

Parent-Child Structure

Containers support explicit parent-child relationships for organizing data assets hierarchically.

Database → Schema hierarchy:

// Level 1: Database
Container database = Container.builder()
    .platform("postgres")
    .database("production")
    .env("PROD")
    .displayName("Production Database")
    .build();

// Level 2: Schema (child of database)
Container schema = Container.builder()
    .platform("postgres")
    .database("production")
    .schema("public")
    .env("PROD")
    .displayName("Public Schema")
    .parentContainer(database.getContainerUrn())
    .build();

Three-Level Hierarchy

Database → Schema → Table Group:

// Level 1: Database
Container database = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .displayName("Analytics Database")
    .build();

// Level 2: Schema
Container schema = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .schema("public")
    .displayName("Public Schema")
    .parentContainer(database.getContainerUrn())
    .build();

// Level 3: Logical grouping
Container tableGroup = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .schema("public")
    .displayName("Customer Tables")
    .qualifiedName("analytics.public.customer_group")
    .parentContainer(schema.getContainerUrn())
    .build();

Managing Parent Relationships

// Set parent container
container.setContainer("urn:li:container:{parent-guid}");

// Get parent container
String parentUrn = container.getParentContainer();

// Clear parent container
container.clearContainer();

Container Operations

Adding Tags

Categorize containers with tags:

container.addTag("production");
container.addTag("tier1");
container.addTag("pii");

// Or use full URN
container.addTag("urn:li:tag:critical");

Managing Owners

Add owners with different ownership types:

import com.linkedin.common.OwnershipType;

// Add technical owner
container.addOwner("urn:li:corpuser:data_platform_team",
                   OwnershipType.TECHNICAL_OWNER);

// Add data steward
container.addOwner("urn:li:corpuser:analytics_lead",
                   OwnershipType.DATA_STEWARD);

// Remove owner
container.removeOwner("urn:li:corpuser:data_platform_team");

Adding Glossary Terms

Associate business glossary terms:

container.addTerm("urn:li:glossaryTerm:ProductionDatabase");
container.addTerm("urn:li:glossaryTerm:CustomerData");

// Remove term
container.removeTerm("urn:li:glossaryTerm:ProductionDatabase");

Setting Domain

Assign container to a domain:

container.setDomain("urn:li:domain:Analytics");

// Clear all domains
container.clearDomains();

Updating Description

Set or update container description:

// Updates editableContainerProperties
container.setDescription("Production database for analytics workloads");

Builder Properties

Required Properties

  • platform: Platform name (e.g., "snowflake", "bigquery", "postgres")
  • displayName: Human-readable name for the container

Optional Properties

  • database: Database name (for database/schema containers)
  • schema: Schema name (for schema containers)
  • env: Environment (default: "PROD")
  • platformInstance: Platform instance identifier
  • qualifiedName: Fully-qualified name (e.g., "prod.snowflake.analytics_db")
  • description: Container description
  • externalUrl: External link to the container
  • parentContainer: Parent container URN
  • customProperties: Map of custom key-value properties

Properties Access

Reading Properties

// Display name
String displayName = container.getDisplayName();

// Qualified name
String qualifiedName = container.getQualifiedName();

// Description
String description = container.getDescription();

// External URL
String externalUrl = container.getExternalUrl();

// Custom properties
Map<String, String> customProps = container.getCustomProperties();

// Parent container
String parentUrn = container.getParentContainer();

Common Patterns

Data Warehouse Structure

Snowflake Database and Schema:

// Database container
Container database = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .env("PROD")
    .displayName("Analytics Database")
    .description("Primary analytics database")
    .build();

database
    .addTag("production")
    .addTag("analytics")
    .addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
    .setDomain("urn:li:domain:Analytics");

// Schema container
Container schema = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .schema("public")
    .env("PROD")
    .displayName("Public Schema")
    .description("Main schema for analytics tables")
    .parentContainer(database.getContainerUrn())
    .build();

schema
    .addTag("public")
    .addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
    .setDomain("urn:li:domain:Analytics");

BigQuery Project and Dataset

// Project container
Container project = Container.builder()
    .platform("bigquery")
    .database("my-project")
    .env("PROD")
    .displayName("My GCP Project")
    .externalUrl("https://console.cloud.google.com/bigquery/project/my-project")
    .build();

// Dataset container
Container dataset = Container.builder()
    .platform("bigquery")
    .database("my-project")
    .schema("analytics")
    .env("PROD")
    .displayName("Analytics Dataset")
    .parentContainer(project.getContainerUrn())
    .build();

Data Lake Folder Structure

// Bucket container
Container bucket = Container.builder()
    .platform("s3")
    .database("my-data-lake")
    .env("PROD")
    .displayName("Data Lake Bucket")
    .build();

// Folder container
Map<String, String> folderProps = new HashMap<>();
folderProps.put("folder_path", "/raw/customer_data");
folderProps.put("file_count", "1500");

Container folder = Container.builder()
    .platform("s3")
    .database("my-data-lake")
    .schema("raw")
    .env("PROD")
    .displayName("Customer Data Folder")
    .parentContainer(bucket.getContainerUrn())
    .customProperties(folderProps)
    .build();

Fluent API

All mutation operations return the container instance for method chaining:

Container database = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .displayName("Analytics Database")
    .build();

database
    .addTag("production")
    .addTag("tier1")
    .addOwner("urn:li:corpuser:data_team", OwnershipType.TECHNICAL_OWNER)
    .addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
    .addTerm("urn:li:glossaryTerm:ProductionDatabase")
    .setDomain("urn:li:domain:Analytics")
    .setDescription("Production analytics database");

Upserting to DataHub

DataHubClientV2 client = DataHubClientV2.builder()
    .server("http://localhost:8080")
    .build();

// Create hierarchy
Container database = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .displayName("Analytics Database")
    .build();

Container schema = Container.builder()
    .platform("snowflake")
    .database("analytics")
    .schema("public")
    .displayName("Public Schema")
    .parentContainer(database.getContainerUrn())
    .build();

// Upsert in order: parent before children
client.entities().upsert(database);
client.entities().upsert(schema);

Complete Example

import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Container;
import java.util.HashMap;
import java.util.Map;

public class ContainerExample {
  public static void main(String[] args) throws Exception {
    DataHubClientV2 client = DataHubClientV2.builder()
        .server("http://localhost:8080")
        .build();

    // Create database container
    Map<String, String> dbProps = new HashMap<>();
    dbProps.put("database_type", "analytics");
    dbProps.put("size_gb", "5000");

    Container database = Container.builder()
        .platform("snowflake")
        .database("analytics_db")
        .env("PROD")
        .displayName("Analytics Database")
        .qualifiedName("prod.snowflake.analytics_db")
        .description("Production analytics database")
        .externalUrl("https://snowflake.example.com/databases/analytics_db")
        .customProperties(dbProps)
        .build();

    database
        .addTag("production")
        .addTag("analytics")
        .addTag("tier1")
        .addOwner("urn:li:corpuser:data_platform", OwnershipType.TECHNICAL_OWNER)
        .addOwner("urn:li:corpuser:analytics_lead", OwnershipType.DATA_STEWARD)
        .addTerm("urn:li:glossaryTerm:ProductionDatabase")
        .setDomain("urn:li:domain:Analytics");

    // Create schema container
    Map<String, String> schemaProps = new HashMap<>();
    schemaProps.put("table_count", "150");
    schemaProps.put("refresh_schedule", "hourly");

    Container schema = Container.builder()
        .platform("snowflake")
        .database("analytics_db")
        .schema("public")
        .env("PROD")
        .displayName("Public Schema")
        .qualifiedName("prod.snowflake.analytics_db.public")
        .description("Main schema for analytics tables")
        .parentContainer(database.getContainerUrn())
        .customProperties(schemaProps)
        .build();

    schema
        .addTag("public")
        .addTag("production-ready")
        .addOwner("urn:li:corpuser:analytics_team", OwnershipType.TECHNICAL_OWNER)
        .setDomain("urn:li:domain:Analytics");

    // Upsert to DataHub
    client.entities().upsert(database);
    client.entities().upsert(schema);

    System.out.println("Created container hierarchy:");
    System.out.println("  Database: " + database.getContainerUrn());
    System.out.println("  Schema: " + schema.getContainerUrn());

    client.close();
  }
}

Best Practices

  1. Order of Creation: Always upsert parent containers before their children
  2. Qualified Names: Use fully-qualified names for clarity (e.g., "prod.snowflake.analytics_db.public")
  3. Custom Properties: Store additional metadata like size, table count, owner team, etc.
  4. Consistent Environment: Use consistent env values across related containers
  5. External URLs: Provide links to containers in source systems for easy navigation
  6. Hierarchical Tags: Apply both specific and inherited tags (e.g., "production" at database level, "public" at schema level)

See Also