This naming scheme unfortunately does not allow for easy representation of the multiplicity of platforms (or technologies) that might be deployed at an organization within the same environment or fabric. For example, an organization might have multiple Redshift instances in Production and would want to see all the data assets located in those instances inside the DataHub metadata repository.
**Note**: While platform instances provide one solution to this problem it comes with trade-offs with respect to immutability. DataHub also offers alternative approaches for organizing and managing multiple platform instances. See the [Alternative Approaches](#alternative-approaches) section below for more information.
As part of the `v0.8.24+` releases, we are unlocking the first phase of supporting Platform Instances in the metadata model. This is done via two main additions:
- The `dataPlatformInstance` aspect that has been added to Datasets which allows datasets to be associated to an instance of a platform
- Enhancements to all ingestion sources that allow them to attach a platform instance to the recipe that changes the generated urns to go from `urn:li:dataset:(urn:li:dataPlatform:<platform>,<name>,ENV)` format to `urn:li:dataset:(urn:li:dataPlatform:<platform>,<instance.name>,ENV)` format. Sources that produce lineage to datasets in other platforms (e.g. Looker, Superset etc) also have specific configuration additions that allow the recipe author to specify the mapping between a platform and the instance name that it should be mapped to.
**DataHub URNs are immutable identifiers that must remain unchanged once assigned to an entity.** This immutability is fundamental to maintaining data integrity, lineage tracking, and consistent references throughout the system. Once a URN is created, it should never be modified, even if the underlying data asset's attributes change.
### The URN Immutability Challenge
Many organizations face a critical challenge: **URNs serve dual purposes** - they are both internal system identifiers AND visible user-facing identifiers in the DataHub UI. This creates a conflict when organizational taxonomy changes (domains, products, systems) because:
1.**Orphaned Assets**: When URNs change, all metadata added outside of ingestion (descriptions, tags, lineage, ownership) associated with the old asset
2.**Integration Disruption**: Downstream applications and integrations that rely on specific URNs break
3.**User Confusion**: URNs visible in the UI become outdated and misleading
4.**Operational Overhead**: Teams must migrate all references to new URNs
### Solution: Separate Technical Identifiers from Business Context
When establishing platform instance naming conventions, it is crucial to choose names that are:
- **Intrinsic to the data**: Based on stable, inherent properties of the data asset
- **Not subject to change**: Avoid names that might change due to organizational restructuring, technology migrations, or operational changes
- **Consistent across all ingestion sources**: The same platform instance name must be used consistently across all recipes to ensure URN alignment
When configuring a platform instance, choose an instance name that is understandable and will be stable for the foreseeable future. e.g. `core_warehouse` or `finance_redshift` are allowed names, as are pure guids like `a37dc708-c512-4fe4-9829-401cd60ed789`. Remember that whatever instance name you choose, you will need to specify it in more than one recipe to ensure that the identifiers produced by different sources will line up.
To ensure URN immutability and long-term stability, platform instance names should be **technical identifiers** that are intrinsic to the infrastructure, not business concepts. Use DataHub's built-in features for domains, ownership, and business context.
1.**Technical focus**: Use infrastructure-level identifiers, not business concepts
2.**Stability**: Choose names that reflect permanent technical characteristics
3.**Consistency**: Use the same naming pattern across all platform instances
4.**Uniqueness**: Ensure each platform instance has a unique identifier
5.**Separation of concerns**: Use DataHub's domain and ownership features for business context
**Note**: Business context like domains, ownership, data classification, and technology migration status should be managed through DataHub's dedicated features (domains, ownership, tags, etc.) rather than embedded in the platform instance name. Environment information is best handled by tags instead of fabric type which allows for promotion over time, and versioning should use DataHub's versioning capabilities.
## Alternative Solutions to URN Immutability Challenges
Instead of changing URNs when organizational taxonomy evolves, DataHub provides several alternative approaches that maintain URN immutability while enabling flexible business context management:
### Recommended Approach: Separate Technical from Business Context
The most effective solution is to design your platform instance naming to be **technically stable** while using DataHub's metadata features for **business context**:
| **Data Products** | No change | High | High | Business-oriented grouping across platforms |
| **Tags/Labels** | No change | High | Low | Flexible metadata and searchable context |
| **Custom Properties** | No change | Medium | Medium | Structured metadata storage |
| **Glossary Terms** | No change | High | Medium | Business context and domain association |
| **Search Features** | No change | High | Low | Discovery and organization without changes |
| **Automation** | No change | Medium | High | Consistent metadata management |
### Choosing the Right Approach
- **Platform Instances**: When you need technical differentiation in URNs
- **Data Products**: When you need business-oriented grouping across platforms
- **Tags/Labels**: When you need flexible, searchable metadata
- **Custom Properties**: When you need structured metadata storage
- **Glossary Terms**: When you need business context association
- **Combined Approach**: Use multiple concepts together for comprehensive organization
## Summary
Platform instances and data products each address different aspects of data organization in DataHub. Platform instances modify URNs to include technical identifiers, while data products provide organizational structure without changing the physical identity of the asset. For organizations with evolving taxonomy, the key is to separate technical identifiers (in URNs) from business context (in metadata), ensuring both immutability and flexibility.