2020-03-11 05:25:32 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# DataHub Features
  
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								DataHub is made up of a [generic backend ](what/gma.md ) and a [Ember-based UI ](../datahub-web ). Original DataHub 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								[blog post ](https://engineering.linkedin.com/blog/2019/data-hub ) talks about the design extensively and mentions some of
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								the features of DataHub. Our open sourcing [blog post ](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:57:08 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								also provides a comparison of some features between LinkedIn production DataHub vs open source DataHub. Below is a list of the latest features that are available in DataHub, as well as ones that will soon become available.
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								## Data Constructs (Entities)
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Datasets
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Browse**: browsing through a configurable hierarchy
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 -  **Schema**: table &  document schema in tabular and JSON format
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Coarse grain lineage**: support for lineage at the dataset level, tabular &  graphical visualization of downstreams/upstreams
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Ownership**: surfacing owners of a dataset, viewing datasets you own
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Dataset life-cycle management**: deprecate/undeprecate, surface removed datasets and tag it with "removed"
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Institutional knowledge**: support for adding free form doc to any dataset
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Fine grain lineage**: support for lineage at the field level [*coming soon*]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Social actions**: likes, follows, bookmarks [*coming soon*]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Compliance management**: field level tag based compliance editing [*coming soon*]
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Top users**: frequent users of a dataset [*coming soon*]
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								### Users
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Browse**: browsing through a configurable hierarchy [*coming soon*]
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 -  **Profile editing**: LinkedIn style professional profile editing such as summary, skills
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:02:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Schemas [*coming soon*]
  
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:02:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Browse**: browsing through a configurable hierarchy
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Schema history**: view and diff historic versions of schemas
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **GraphQL**: visualization of GraphQL schemas
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Jos/flows [*coming soon*]
  
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:02:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Browse**: browsing through a configurable hierarchy
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Basic information**: 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Execution history**: Executions and their status. Link to external service for viewing full info.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Metrics [*coming soon*]
  
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Browse**: browsing through a configurable hierarchy
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Basic information**: ownershp, dimensions, formula, input &  output datasets, dashboards
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  **Institutional knowledge**: support for adding free form doc to any metric
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:03:18 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								### Dashboards [*coming soon*] 
  
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **search**: full-text &  advanced search, search ranking
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 05:02:30 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Basic information**: ownership, location. Link to exzternal service for viewing the dashboard.
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  **Institutional knowledge**: support for adding free form doc to any dashboards 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								## Metadata Sources
  
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								You can integrate any data platform to DataHub easily. As long as you have a way of *Extracting*  metadata from the platform and *Transform*  that into our standard [MCE ](what/mxe.md ) format, you're free to *Load* /ingest metadata to DataHub from any available platform.
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								We have provided example [ETL ingestion ](architecture/metadata-ingestion.md ) scripts for:
							 
						 
					
						
							
								
									
										
										
										
											2020-03-10 23:32:50 -07:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 -  Hive
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 -  Kafka
							 
						 
					
						
							
								
									
										
										
										
											2020-08-12 08:51:39 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  RDBMS (MySQL, Oracle, Postgres, MS SQL etc)
							 
						 
					
						
							
								
									
										
										
										
											2020-07-22 16:37:50 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  Data warehouse (Snowflake, BigQuery etc)
							 
						 
					
						
							
								
									
										
										
										
											2020-03-11 04:56:22 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 -  LDAP