mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-11-04 04:39:10 +00:00 
			
		
		
		
	
		
			
				
	
	
	
		
			3.4 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			3.4 KiB
		
	
	
	
	
	
	
	
DataHub Features
DataHub is made up of a generic backend and a React-based UI. Original DataHub blog post talks about the design extensively and mentions some of the features of DataHub. Our open sourcing blog post also provides a comparison of some features between LinkedIn production DataHub vs open source DataHub. Below is a list of the latest features that are available in DataHub, as well as ones that will soon become available.
Entities
Datasets
- Search: full-text & advanced search, search ranking
 - Browse: browsing through a configurable hierarchy
 - Schema: table & document schema in tabular and JSON format
 - Coarse grain lineage: support for lineage at the dataset level, tabular & graphical visualization of downstreams/upstreams
 - Ownership: surfacing owners of a dataset, viewing datasets you own
 - Dataset life-cycle management: deprecate/undeprecate, surface removed datasets and tag it with "removed"
 - Institutional knowledge: support for adding free form doc to any dataset
 - Fine grain lineage: support for lineage at the field level [coming soon]
 - Social actions: likes, follows, bookmarks [coming soon]
 - Compliance management: field level tag based compliance editing [coming soon]
 - Top users: frequent users of a dataset [coming soon]
 
Users & Groups
- Search: full-text & advanced search, search ranking
 - Browse: browsing through a configurable hierarchy [coming soon]
 - Profile editing: LinkedIn style professional profile editing such as summary, skills
 
Dashboards & Charts
- Search: full-text & advanced search, search ranking
 - Basic information: ownership, location. Link to external service for viewing the dashboard.
 - Institutional knowledge: support for adding free form doc to any dashboards [coming soon]
 
Tasks & Pipelines
- Search: full-text & advanced search, search ranking
 - Browse: browsing through a configurable hierarchy
 - Basic information:
 - Execution history: Executions and their status. Link to external service for viewing full info.
 
Tags
- Globally defined: Tags provided a standardized set of labels that can be shared across all your entities
 - Supports entities and schemas: Tags can be applied at the entity level or for datasets, attached to schema fields.
 - Searchable Entities can be searched and filtered by tag
 
Schemas [coming soon]
- Search: full-text & advanced search, search ranking
 - Browse: browsing through a configurable hierarchy
 - Schema history: view and diff historic versions of schemas
 - GraphQL: visualization of GraphQL schemas
 
Metrics [coming soon]
- Search: full-text & advanced search, search ranking
 - Browse: browsing through a configurable hierarchy
 - Basic information: ownershp, dimensions, formula, input & output datasets, dashboards
 - Institutional knowledge: support for adding free form doc to any metric
 
Metadata Sources
We have a Metadata Ingestion Framework which supports a variety of popular connectors, like
- BigQuery
 - Snowflake
 - Redshift
 - Postgres
 - Kafka
 - MySQL
 - Hive
 - Looker
 - MongoDB
 
and many more.