DataHub uses the Pegasus schema (PDL) language extended with a custom set of annotations to model metadata.
Conceptually, metadata is modeled using the following abstractions
- **Entities**: An entity is the primary node in the metadata graph. For example, an instance of a Dataset or a CorpUser is an Entity. An entity is made up of a unique identifier (a primary key) and groups of metadata which we call aspects.
- **Aspects**: An aspect is a collection of attributes that describes a particular facet of an entity. They are the smallest atomic unit of write in DataHub. That is, Multiple aspects associated with the same Entity can be updated independently. For example, DatasetProperties contains a collection of attributes that describes a Dataset. Aspects can be shared across entities, for example the "Ownership" an aspect is re-used across all the Entities that have owners.
- **Keys & Urns**: A key is a special type of aspect that contains the fields that uniquely identify an individual Entity. Key aspects can be serialized into *Urns*, which represent a stringified form of the key fields used for primary-key lookup. Moreover, *Urns* can be converted back into key aspect structs, making key aspects a type of "virtual" aspect. Key aspects provide a mechanism for clients to easily read fields comprising the primary key, which are usually generally useful like Dataset names, platform names etc. Urns provide a friendly handle by which Entities can be queried without requiring a fully materialized struct.
- **Relationships**: A relationship represents a named edge between 2 entities. They are declared via foreign key attributes within Aspects along with a custom annotation (@Relationship). Relationships permit edges to be traversed bi-directionally. For example, a Chart may refer to a CorpUser as its owner via a relationship named "OwnedBy". This edge would be walkable starting from the Chart *or* the CorpUser instance.
Here is an example graph consisting of 3 types of entity (CorpUser, Chart, Dashboard), 2 types of relationship (OwnedBy, Contains), and 3 types of metadata aspect (Ownership, ChartInfo, and DashboardInfo).
As you'll notice, we perform the lookup using the url-encoded *Urn* associated with an entity.
The response would be an "Entity" record containing the Entity Snapshot (which in turn contains the latest aspects associated with the Entity).
### Search Query
A search query allows you to search for entities matching an arbitrary string.
For example, to search for entities matching the term "customers", we can use the following CURL:
```
curl --location --request POST 'http://localhost:8080/entities?action=search' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"input": "\"customers\"",
"entity": "chart",
"start": 0,
"count": 10
}'
```
The notable parameters are `input` and `entity`. `input` specifies the query we are issuing and `entity` specifies the Entity Type we want to search over. This is the common name of the Entity as defined in the @Entity definition. The response contains a list of Urns, that can be used to fetch the full entity.
### Relationship Query
A relationship query allows you to find Entity connected to a particular source Entity via an edge of a particular type.
For example, to find the owners of a particular Chart, we can use the following CURL:
```
curl --location --request GET --header 'X-RestLi-Protocol-Version: 2.0.0' 'http://localhost:8080/relationships?direction=OUTGOING&urn=urn:li:chart:customers&types=OwnedBy'
```
The notable parameters are `direction`, `urn` and `types`. The response contains *Urns* associated with all entities connected
to the primary entity (urn:li:chart:customer) by an relationship named "OwnedBy". That is, it permits fetching the owners of a given
chart.
### Special Aspects
There are 2 "special" aspects worth mentioning:
1. Key aspects
2. Browse aspect
#### Key aspects
As introduced above, Key aspects are structs / records that contain the fields that uniquely identify an Entity. There are
some constraints about the fields that can be present in Key aspects:
- All fields must be of STRING or ENUM type
- All fields must be REQUIRED
Keys can be created from and turned into *Urns*, which represent the stringified version of the Key record.
The algorithm used to do the conversion is straightforward: the fields of the Key aspect are substituted into a
string template based on their index (order of definition) using the following template: