| DataHub Operator | Add new entities | The default domain model does not match my business needs
| DataHub Operator | Extend existing entities | The default domain model does not match my business needs
What we heard from folks in the community is that adding new entities + aspects is just **too difficult**.
They'd be happy if this process was streamlined and simple. **Extra** happy if there was no chance of merge conflicts in the future. (no fork necessary)
# Goals
### Primary Goal
**Reduce the friction** of adding new entities, aspects, and relationships.
### Secondary Goal
Achieve the primary goal in a way that does not require a fork.
# Requirements
### Must-Haves
1. Mechanisms for **adding** a browsable, searchable, linkable GMS entity by defining one or more PDL models
- GMS Endpoint for fetching entity
- GMS Endpoint for fetching entity relationships
- GMS Endpoint for searching entity
- GMS Endpoint for browsing entity
2. Mechanisms for **extending** a ****browsable, searchable, linkable GMS ****entity by defining one or more PDL models
- GMS Endpoint for fetching entity
- GMS Endpoint for fetching entity relationships
- GMS Endpoint for searching entity
- GMS Endpoint for browsing entity
3. Mechanisms + conventions for introducing a new **relationship** between 2 GMS entities without writing code
4. Clear documentation describing how to perform actions in #1, #2, and #3 above published on [datahubproject.io](http://datahubproject.io)
## Nice-to-haves
1. Mechanisms for automatically generating a working GraphQL API using the entity PDL models
2. Ability to add / extend GMS entities without a fork.
- e.g. **Register** new entity / extensions *at runtime*. (Unlikely due to code generation)
- or, **configure** new entities at *deploy time*
## What Success Looks Like
1. Adding a new browsable, searchable entity to GMS (not DataHub UI / frontend) takes 1 dev <15minutes.
2. Extending an existing browsable, searchable entity in GMS takes 1 dev <15minutes
3. Adding a new relationship among 2 GMS entities takes 1 dev <15minutes
4. [Bonus] Implementing the `datahub-frontend` GraphQL API for a new / extended entity takes <10minutes
2. [Root] [Snapshots](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/Snapshot.pdl) - Container of aspects
3. [Aspects](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/DashboardAspect.pdl) - Optional container of fields
4. [Values](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/dataset/Dataset.pdl), [Keys](https://github.com/linkedin/datahub/blob/master/gms/api/src/main/pegasus/com/linkedin/dataset/DatasetKey.pdl) - Model returned by GMS [Rest.li](http://rest.li) API (public facing)
5. [Entities](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/entity/DatasetEntity.pdl) - Records with fields derived from the URN. Used only in graph / relationships
6. [Relationships](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/relationship/Relationship.pdl) - Edges between 2 entities with optional edge properties
7. [Search Documents](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/search/ChartDocument.pdl) - Flat documents for indexing within Elastic index
- And corresponding index [mappings.json](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/resources/index/chart/mappings.json), [settings.json](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/resources/index/chart/settings.json)
Various components of GMS depend on / make assumptions about these model types:
1. IndexBuilders depend on **Documents**
2. GraphBuilders depend on **Snapshots**
3. RelationshipBuilders depend on **Aspects**
4. Mae Processor depend on **Snapshots, Documents, Relationships**
5. Mce Processor depend on **Snapshots, Urns**
6. [Rest.li](http://rest.li) Resources on **Documents, Snapshots, Aspects, Values, Urns**
7. Graph Reader Dao (BaseQueryDao) depends on **Relationships, Entity**
8. Graph Writer Dao (BaseGraphWriterDAO) depends on **Relationships, Entity**
9. Local Dao Depends on **aspects, urns**
10. Search Dao depends on **Documents**
Additionally, there are some implicit concepts that require additional caveats / logic:
1. Browse Paths - Requires defining logic in an entity-specific index builder to generate.
2. Urns - Requires defining a) an Urn PDL model and b) a hand-written Urn class
As you can see, there are many tied up concepts. Fundamentally changing the model would require a serious amount of refactoring, as it would require new versions of numerous components.
The challenge is, how can we meet the requirements without fundamentally altering the model?
## Proposed Solution
In a nutshell, the idea is to consolidate the number of models + code we need to write on a per-entity basis.
We intend to achieve this by making search index + relationship configuration declarative, specified as part of the model
definition itself.
We will use this configuration to drive more generic versions of the index builders + rest resources,
with the intention of reducing the overall surface area of GMS.
During this initiative, we will also seek to make the concepts of Browse Paths and Urns declarative. Browse Paths
will be provided using a special BrowsePaths aspect. Urns will no longer be strongly typed.
To achieve this, we will attempt to generify many components throughout the stack. Currently, many of them are defined on
a *per-entity* basis, including
- Rest.li Resources
- Index Builders
- Graph Builders
- Local, Search, Browse, Graph DAOs
- Clients
- Browse Path Logic
along with simplifying the number of raw data models that need defined, including
- Rest.li Resource Models
- Search Document Models
- Relationship Models
- Urns + their java classes
From an architectural PoV, we will move from a before that looks something like this:

to an after that looks like this

That is, a move away from patterns of strong-typing-everywhere to a more generic + flexible world.
### How will we do it?
We will accomplish this by building the following:
1. Set of custom annotations to permit declarative entity, search, graph configurations
-@Entity&@Aspect
-@Searchable
-@Relationship
2. Entity Registry: In-memory structures for representing, storing & serving metadata associated with a particular Entity, including search and relationship configurations.
3. Generic Entity, Search, Graph Service classes: Replaces traditional strongly-typed DAOs with flexible, pluggable APIs that can be used for CRUD, search, and graph across all entities.