mirror of
https://github.com/datahub-project/datahub.git
synced 2025-10-27 08:54:32 +00:00
feat(tags): RFC for tags (#2112)
This commit is contained in:
parent
5c4e671ac5
commit
973768fc5a
@ -115,6 +115,7 @@ module.exports = {
|
|||||||
"docs/rfc/active/business_glossary/README",
|
"docs/rfc/active/business_glossary/README",
|
||||||
"docs/rfc/active/graph_ql_frontend/queries",
|
"docs/rfc/active/graph_ql_frontend/queries",
|
||||||
"docs/rfc/active/react-app/README",
|
"docs/rfc/active/react-app/README",
|
||||||
|
"docs/rfc/active/tags/README",
|
||||||
],
|
],
|
||||||
},
|
},
|
||||||
],
|
],
|
||||||
|
|||||||
190
docs/rfc/active/tags/README.md
Normal file
190
docs/rfc/active/tags/README.md
Normal file
@ -0,0 +1,190 @@
|
|||||||
|
- Start Date: 2021-02-17
|
||||||
|
- RFC PR: https://github.com/linkedin/datahub/pull/2112
|
||||||
|
- Discussion Issue: (GitHub issue this was discussed in before the RFC, if any)
|
||||||
|
- Implementation PR(s): (leave this empty)
|
||||||
|
|
||||||
|
# Tags
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
We suggest a generic, global tagging solution for Datahub. As the solution is quite generic and flexible, it can also
|
||||||
|
hopefully serve as an stepping stone for new, cool features in the future.
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
Currently some entities, such as Datasets, can be tagged using strings, but unfortunately this solution is quite
|
||||||
|
limited.
|
||||||
|
|
||||||
|
A general tag implementation will allow us to define and attach a new and simple type of metadata to all type of
|
||||||
|
entities. As the tags would be defined globally, tagging multiple objects with the same tag will give us the possibility
|
||||||
|
to define and search based on a new kind of relationship, for example which datasets and ML Models that are tagged to
|
||||||
|
include PII data. This allows for describing relationships between object that would otherwise not have a direct lineage
|
||||||
|
relationship. Moreover, tags would lower that bar to add simple metadata to any object in the Datahub instance and open
|
||||||
|
the door to crowd-sourcing metadata. Remembering that tags themselves are entities, it would also be possible to tag
|
||||||
|
tags, enabling a hierarchy of sorts.
|
||||||
|
|
||||||
|
The solution is meant to be quite generic and flexible, and we're not trying to be too opinionated about how a user
|
||||||
|
should use the feature. We hope that this initial generic solution can serve as a stepping stone for cool futures in the
|
||||||
|
future.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Ability to associate tags with any type of entity, even other tags!
|
||||||
|
- Ability to tag the same entity with multiple tags.
|
||||||
|
- Ability to tag multiple objects with the same tag instance.
|
||||||
|
- To the point above, ability to make easy tag-based searches later on.
|
||||||
|
- Metadata on tags is TBD
|
||||||
|
|
||||||
|
### Extensibility
|
||||||
|
|
||||||
|
The normal new-entity-onboarding work is obviously required.
|
||||||
|
|
||||||
|
Hopefully this can serve as a stepping stone to work on special cases such as the tag-based privacy tagging mentioned in
|
||||||
|
the roadmap.
|
||||||
|
|
||||||
|
## Non-Requirements
|
||||||
|
|
||||||
|
Let's leave the UI work required for this to another time.
|
||||||
|
|
||||||
|
## Detailed design
|
||||||
|
|
||||||
|
We want to introduce some new under `datahub/metadata-models/src/main/pegasus/com/linkedin/common/`.
|
||||||
|
|
||||||
|
### `Tag` entity
|
||||||
|
|
||||||
|
First we create a `TagMetadata` entity, which defines the actual tag-object.
|
||||||
|
|
||||||
|
The edit property defines the edit rights of the tag, as some tags (like sensitivity tags) should be read-only for a
|
||||||
|
majority of users
|
||||||
|
|
||||||
|
```
|
||||||
|
/**
|
||||||
|
* Tag information
|
||||||
|
*/
|
||||||
|
record TagMetadata {
|
||||||
|
/**
|
||||||
|
* Tag URN, e.g. urn:li:tag:<name>
|
||||||
|
*/
|
||||||
|
urn: TagUrn
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tag value.
|
||||||
|
*/
|
||||||
|
value: string
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Optional tag description
|
||||||
|
*/
|
||||||
|
description: optional string
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Audit stamp associated with creation of this tag
|
||||||
|
*/
|
||||||
|
createStamp: AuditStamp
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `TagAttachment`
|
||||||
|
|
||||||
|
We define a `TagAttachment`-model, which describes the application of a tag to a entity
|
||||||
|
|
||||||
|
```
|
||||||
|
/**
|
||||||
|
* Tag information
|
||||||
|
*/
|
||||||
|
record TagAttachment {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tag in question
|
||||||
|
*/
|
||||||
|
tag: TagUrn
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Who has edit rights to this employment.
|
||||||
|
* WIP, pending access-control support in Datahub.
|
||||||
|
* Relevant for privacy tags at least.
|
||||||
|
* We might also want to add view rights?
|
||||||
|
*/
|
||||||
|
edit: union[None, any, role-urn]
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Audit stamp associated with employment of this tag to this entity
|
||||||
|
*/
|
||||||
|
attachmentStamp: AuditStamp
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `Tags` container
|
||||||
|
|
||||||
|
Then we define a `Tags`-aspect, which is used as a container for tag employments.
|
||||||
|
|
||||||
|
```
|
||||||
|
namespace com.linkedin.common
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tags information
|
||||||
|
*/
|
||||||
|
record Tags {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* List of tag employments
|
||||||
|
*/
|
||||||
|
elements: array[TagAttachment] = [ ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This can easily be taken into use with wall entities that we want to be able to use tags, e.g. `Datasets`. As we see a
|
||||||
|
lot of potential in tagging individual dataset fields as well, we can either add a reference to a Tags-object in the
|
||||||
|
`SchemaField` object, or alternative create a new `DatasetFieldTags`, similar to `DatasetFieldMapping`.
|
||||||
|
|
||||||
|
## How we teach this
|
||||||
|
|
||||||
|
We should create/update user guides to educate users for:
|
||||||
|
|
||||||
|
- Suggestions on how to use tags: low threshold metadata-addition, and the possibility of doing new types of searches
|
||||||
|
|
||||||
|
## Drawbacks
|
||||||
|
|
||||||
|
This is definitely more complex than just adding strings to an array.
|
||||||
|
|
||||||
|
## Alternatives
|
||||||
|
|
||||||
|
An array of string is a simple solution but does allow for the same functionality as suggested here.
|
||||||
|
|
||||||
|
Another alternative would be simplify the models by removing some of the metadata in the `TagMetadata` and
|
||||||
|
`TagAttachment` entities, such as the the edit/view permission field, the audit stamps, and the descriptions.
|
||||||
|
|
||||||
|
Apache Atlas uses a similar approach. The require you to create a Tag instance before it can be associated with an
|
||||||
|
"asset", and the attachment is done using a dropdown list. The tags can also have attributes and a description. See
|
||||||
|
[here](https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-governance/content/ch_working_with_atlas_tags.html)
|
||||||
|
for an example. The tags are a central piece in the UI and readably searchable, as easily as datasets.
|
||||||
|
|
||||||
|
Atlas also has concept very closely related to tags, called _classification_. Classifications are similar to tags in
|
||||||
|
that they need to be created separately, can have attributes (but no description?) and are attached to assets is done
|
||||||
|
using a dropdown list. Classifications have the added functionality of propagation, which means that they are
|
||||||
|
automatically applied to downstream assets, unless specifically set to not do so. Any change to a classification (say an
|
||||||
|
attribute change) also flows downstream, and in downstream assets you're able to see from where the classification
|
||||||
|
propagated from. See
|
||||||
|
[here](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-atlas/content/propagate_classifications_to_derived_entities.html)
|
||||||
|
for an example.
|
||||||
|
|
||||||
|
## Rollout / Adoption Strategy
|
||||||
|
|
||||||
|
Using the functionality is optional and does not break other functionality as is. The solution is generic enough that
|
||||||
|
the users can easily take into use. It can be take into use as any other entity and aspect.
|
||||||
|
|
||||||
|
## Future Work
|
||||||
|
|
||||||
|
- add `Tags` to aspects for entities.
|
||||||
|
- Implement relationship builders as needed.
|
||||||
|
- The implementation of and need for access control to tags is an open question
|
||||||
|
- As this is first and foremost a tool for discovery, the UI work is extensible:
|
||||||
|
- Creating tags in a way that makes duplication and spelling mistakes difficult.
|
||||||
|
- Attaching tags to entities: autocomplete, dropdown, etc.
|
||||||
|
- Visualizing existing tags, and which are most popular?
|
||||||
|
- Explore the idea about a special "classification" type, that propagates downstream, as in Atlas.
|
||||||
|
|
||||||
|
## Unresolved questions
|
||||||
|
|
||||||
|
- How do we want to map dataset fields to tags?
|
||||||
|
- Do we want to implement edit/view rights?
|
||||||
Loading…
x
Reference in New Issue
Block a user