2020-03-11 05:25:32 -07:00
# DataHub Features
2020-03-10 23:32:50 -07:00
2021-03-05 00:12:12 -08:00
DataHub is made up of a [generic backend ](what/gma.md ) and a [React-based UI ](../datahub-web-react/README.md ).
Original DataHub [blog post ](https://engineering.linkedin.com/blog/2019/data-hub ) talks about the design extensively and mentions some of the features of DataHub.
Our open sourcing [blog post ](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p ) also provides a comparison of some features between LinkedIn production DataHub vs open source DataHub. Below is a list of the latest features that are available in DataHub, as well as ones that will soon become available.
2020-03-10 23:32:50 -07:00
2021-08-11 18:49:16 -07:00
## Entities
2020-03-10 23:32:50 -07:00
### Datasets
- **Search**: full-text & advanced search, search ranking
2020-03-11 04:56:22 -07:00
- **Browse**: browsing through a configurable hierarchy
2020-03-10 23:32:50 -07:00
- **Schema**: table & document schema in tabular and JSON format
- **Coarse grain lineage**: support for lineage at the dataset level, tabular & graphical visualization of downstreams/upstreams
- **Ownership**: surfacing owners of a dataset, viewing datasets you own
- **Dataset life-cycle management**: deprecate/undeprecate, surface removed datasets and tag it with "removed"
- **Institutional knowledge**: support for adding free form doc to any dataset
2020-03-11 05:03:18 -07:00
- **Fine grain lineage**: support for lineage at the field level [*coming soon*]
- **Social actions**: likes, follows, bookmarks [*coming soon*]
- **Compliance management**: field level tag based compliance editing [*coming soon*]
- **Top users**: frequent users of a dataset [*coming soon*]
2020-03-10 23:32:50 -07:00
2021-08-11 18:49:16 -07:00
### Users & Groups
2020-03-10 23:32:50 -07:00
- **Search**: full-text & advanced search, search ranking
2020-03-11 05:03:18 -07:00
- **Browse**: browsing through a configurable hierarchy [*coming soon*]
2020-03-10 23:32:50 -07:00
- **Profile editing**: LinkedIn style professional profile editing such as summary, skills
2020-03-11 05:02:30 -07:00
2021-08-11 18:49:16 -07:00
### Dashboards & Charts
2021-03-18 09:55:05 -07:00
- **Search**: full-text & advanced search, search ranking
- **Basic information**: ownership, location. Link to external service for viewing the dashboard.
- **Institutional knowledge**: support for adding free form doc to any dashboards [*coming soon*]
2021-08-11 18:49:16 -07:00
### Tasks & Pipelines
2020-03-11 05:02:30 -07:00
- **Search**: full-text & advanced search, search ranking
- **Browse**: browsing through a configurable hierarchy
- **Basic information**:
- **Execution history**: Executions and their status. Link to external service for viewing full info.
2021-08-11 18:49:16 -07:00
### Tags
- **Globally defined**: Tags provided a standardized set of labels that can be shared across all your entities
- **Supports entities and schemas**: Tags can be applied at the entity level or for datasets, attached to schema fields.
- **Searchable** Entities can be searched and filtered by tag
### Schemas [*coming soon*]
- **Search**: full-text & advanced search, search ranking
- **Browse**: browsing through a configurable hierarchy
- **Schema history**: view and diff historic versions of schemas
- **GraphQL**: visualization of GraphQL schemas
2020-03-11 05:03:18 -07:00
### Metrics [*coming soon*]
2021-02-15 16:34:59 -08:00
- **Search**: full-text & advanced search, search ranking
2020-03-11 04:56:22 -07:00
- **Browse**: browsing through a configurable hierarchy
- **Basic information**: ownershp, dimensions, formula, input & output datasets, dashboards
- **Institutional knowledge**: support for adding free form doc to any metric
2021-09-02 19:05:13 -07:00
## Fine-Grained Access Controls
DataHub also provides mechanisms to control *who* has access to *which* metadata entities via UI & API. Using this functionality,
admins of DataHub can define policies such as
- Dataset Owners should be able to update Documentation, but not Tags, for all datasets.
- A specific Data Steward should be able to add tags to any Dataset, but edit nothing else.
- Data Platform team should have all privileges for DataHub, including manging policies & viewing platform analytics.
For an in-depth introduction into Fine-Grained Access Control, check out [Fine-Grained Access Policies ](./policies.md ) and
the August 2021 [Town Hall demo ](https://www.youtube.com/watch?v=3joZINi3ti4 ).
2020-03-10 23:32:50 -07:00
## Metadata Sources
2020-03-11 04:56:22 -07:00
2021-08-11 18:49:16 -07:00
We have a [Metadata Ingestion Framework ](../metadata-ingestion/README.md ) which supports a variety of popular connectors, like
- BigQuery
- Snowflake
- Redshift
- Postgres
- Kafka
- MySQL
- Hive
- Looker
- MongoDB
2021-02-15 16:34:59 -08:00
2021-08-11 18:49:16 -07:00
and many more.