datahub/docs/what/urn.md

58 lines
2.7 KiB
Markdown
Raw Permalink Normal View History

2020-03-18 18:21:26 -07:00
# What is URN?
2020-03-19 08:59:40 -07:00
URN ([Uniform Resource Name](https://en.wikipedia.org/wiki/Uniform_Resource_Name)) is the chosen scheme of [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) to uniquely define any resource in DataHub. It has the following form
2020-03-18 18:21:26 -07:00
```
2020-03-19 08:53:26 -07:00
urn:<Namespace>:<Entity Type>:<ID>
2020-03-18 18:21:26 -07:00
```
[Onboarding a new entity](../modeling/metadata-model.md) to [GMA](gma.md) starts with modelling an URN specific to that entity.
You can use the existing [URN models](../../li-utils/src/main/javaPegasus/com/linkedin/common/urn) for built-in entities as a reference.
2020-03-18 18:21:26 -07:00
2020-03-19 08:53:26 -07:00
## Namespace
All URNs available in DataHub are using `li` as their namespace.
2020-03-18 18:21:26 -07:00
This can be easily changed to a different namespace for your organization if you fork DataHub.
2020-03-19 08:53:26 -07:00
## Entity Type
2020-03-18 18:21:26 -07:00
Entity type for URN is different than [entity](entity.md) in GMA context. This can be thought of as the object type of
any resource for which you need unique identifier for its each instance. While you can create URNs for GMA entities such as
[DatasetUrn] with entity type `dataset`, you can also define URN for data platforms, [DataPlatformUrn].
2020-03-19 08:53:26 -07:00
## ID
ID is the unique identifier part of a URN. It's unique for a specific entity type within a specific namespace.
ID could contain a single field, or multi fields in the case of complex URNs. A complex URN can even contain other URNs as ID fields. This type of URN is also referred to as nested URN. For non-URN ID fields, the value can be either a string, number, or [Pegasus Enum](https://linkedin.github.io/rest.li/pdl_schema#enum-type).
2020-03-19 08:53:26 -07:00
Here are some example URNs with a single ID field:
2020-03-18 18:21:26 -07:00
```
urn:li:dataPlatform:kafka
urn:li:corpuser:jdoe
```
[DatasetUrn](../../li-utils/src/main/javaPegasus/com/linkedin/common/urn/DatasetUrn.java) is an example of a complex nested URN. It contains 3 ID fields: `platform`, `name` and `fabric`, where `platform` is another [URN](../../li-utils/src/main/javaPegasus/com/linkedin/common/urn/DataPlatformUrn.java). Here are some examples
2020-03-18 18:21:26 -07:00
```
urn:li:dataset:(urn:li:dataPlatform:kafka,PageViewEvent,PROD)
urn:li:dataset:(urn:li:dataPlatform:hdfs,PageViewEvent,EI)
```
## Restrictions
There are a few restrictions when creating an URN:
The following characters are not allowed anywhere in the URN
1. Parentheses are reserved characters in URN fields: `(` or `)`
2. The "unit separator" unicode character `␟` (U+241F)
The following characters are not allowed within an URN tuple.
1. Commas are reserved characters in URN tuples: `,`
Example: `urn:li:dashboard:(looker,dashboards.thelook)` is a valid urn, but `urn:li:dashboard:(looker,dashboards.the,look)` is invalid.
2024-11-18 17:50:04 -06:00
Please do not use these characters when creating or generating urns. One approach is to use URL encoding for the characters.