Update features.md

This commit is contained in:
Mars Lan 2020-03-11 04:56:22 -07:00 committed by GitHub
parent 7a0443cc4d
commit 990b3453c1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,35 +1,45 @@
# Features of DataHub # Features of DataHub
DataHub is composed of a [generic backend infra](what/gma.md) and a [Ember-based UI](../datahub-web). Original DataHub DataHub is made up of a [generic backend](what/gma.md) and a [Ember-based UI](../datahub-web). Original DataHub
[blog post](https://engineering.linkedin.com/blog/2019/data-hub) extensively talks about the design and mentions some of [blog post](https://engineering.linkedin.com/blog/2019/data-hub) talks about the design extensively and mentions some of
the features of DataHub. Our open sourcing [blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) the features of DataHub. Our open sourcing [blog post](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p)
also provides a comparison of some features between LinkedIn production DataHub vs open source DataHub. Although, these also provides a comparison of some features between LinkedIn production DataHub vs open source DataHub. Below is a list of the latest features that are available in DataHub, as well as features that will soon become available.
are good references, we'll list down all available (also WIP) features of DataHub.
## Data Constructs (Entities) ## Data Constructs (Entities)
Currently, open source DataHub only supports datasets, users and groups data constructs.
### Datasets ### Datasets
- **Search**: full-text & advanced search, search ranking - **Search**: full-text & advanced search, search ranking
- **Browse**: browsing through a fixed hierarchy - **Browse**: browsing through a configurable hierarchy
- **Schema**: table & document schema in tabular and JSON format - **Schema**: table & document schema in tabular and JSON format
- **Coarse grain lineage**: support for lineage at the dataset level, tabular & graphical visualization of downstreams/upstreams - **Coarse grain lineage**: support for lineage at the dataset level, tabular & graphical visualization of downstreams/upstreams
- **Ownership**: surfacing owners of a dataset, viewing datasets you own - **Ownership**: surfacing owners of a dataset, viewing datasets you own
- **Dataset life-cycle management**: deprecate/undeprecate, surface removed datasets and tag it with "removed" - **Dataset life-cycle management**: deprecate/undeprecate, surface removed datasets and tag it with "removed"
- **Institutional knowledge**: support for adding free form doc to any dataset - **Institutional knowledge**: support for adding free form doc to any dataset
- **Fine grain lineage**: support for lineage at the field level [*Not available yet*] - **Fine grain lineage**: support for lineage at the field level [*available soon*]
- **Social actions**: likes, follows, bookmarks [*Not available yet*] - **Social actions**: likes, follows, bookmarks [*available soon*]
- **Compliance management**: field level tag based compliance editing [*Not available yet*] - **Compliance management**: field level tag based compliance editing [*available soon*]
- **Top users**: frequent users of a dataset [*Not available yet*] - **Top users**: frequent users of a dataset [*available soon*]
### Users ### Users
- **Search**: full-text & advanced search, search ranking - **Search**: full-text & advanced search, search ranking
- **Browse**: browsing through a configurable hierarchy [*available soon*]
- **Profile editing**: LinkedIn style professional profile editing such as summary, skills - **Profile editing**: LinkedIn style professional profile editing such as summary, skills
### Metrics [*available soon*]
- **search**: full-text & advanced search, search ranking
- **Browse**: browsing through a configurable hierarchy
- **Basic information**: ownershp, dimensions, formula, input & output datasets, dashboards
- **Institutional knowledge**: support for adding free form doc to any metric
### Dashboards [*available soon*]
- **search**: full-text & advanced search, search ranking
- **Basic information**: ownership, location
- **Institutional knowledge**: support for adding free form doc to any dashboards
## Metadata Sources ## Metadata Sources
You can integrate any data platform to DataHub easily. As long as you have a way of *E*xtracting metadata from the platform and You can integrate any data platform to DataHub easily. As long as you have a way of *Extracting* metadata from the platform and *Transform* that into our standard [MCE](what/mxe.md) format, you're free to *Load*/ingest metadata to DataHub from any available platform.
*T*ransform that into our standard [MCE](what/mxe.md) format, you're free to *L*oad/ingest metadata to DataHub from any available platform.
We have provided [ETL ingestion](architecture/metadata-ingestion.md) pipelines for: We have provided example [ETL ingestion](architecture/metadata-ingestion.md) scripts for:
- Hive - Hive
- Kafka - Kafka
- RDBMS - RDBMS