mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-31 10:49:00 +00:00 
			
		
		
		
	
		
			
	
	
		
			78 lines
		
	
	
		
			3.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			78 lines
		
	
	
		
			3.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | import FeatureAvailability from '@site/src/components/FeatureAvailability'; | ||
|  | 
 | ||
|  | # About DataHub Dataset Usage & Query History
 | ||
|  | 
 | ||
|  | <FeatureAvailability/> | ||
|  | 
 | ||
|  | Dataset Usage & Query History can give dataset-level information about the top queries which referenced a dataset. | ||
|  | 
 | ||
|  | Usage data can help identify the top users who probably know the most about the dataset and top queries referencing this dataset. | ||
|  | You can also get an overview of the overall number of queries and distinct users. | ||
|  | In some sources, column level usage is also calculated, which can help identify frequently used columns. | ||
|  | 
 | ||
|  | With sources that support usage statistics, you can collect Dataset, Dashboard, and Chart usages. | ||
|  | 
 | ||
|  | ## Dataset Usage & Query History Setup, Prerequisites, and Permissions
 | ||
|  | 
 | ||
|  | To ingest Dataset Usage & Query History data, you should check first on the specific source doc | ||
|  | if it is supported by the Datahub source and how to enable it. | ||
|  | 
 | ||
|  | You can validate this on the Datahub source's capabilities section: | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/source-snowflake-capabilities.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | Some sources require a separate, usage-specific recipe to ingest Usage and Query History metadata. In this case, it is noted in the capabilities summary, like so: | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/source-redshift-capabilities.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | Please, always check the usage prerequisities page if the source has as it can happen you have to add additional | ||
|  | permissions which only needs for usage. | ||
|  | 
 | ||
|  | ## Using Dataset Usage & Query History
 | ||
|  | 
 | ||
|  | After successful ingestion, the Queries and Stats tab will be enabled on datasets with any usage. | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-queries-tab.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | On the Queries tab, you can see the top 5 most often run queries which referenced this dataset. | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-query-history-page.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | On the Stats tab, you can see the top 5 users who run the most queries which referenced this dataset | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-usage-stats-tab.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | With the collected usage data, you can even see column-level usage statistics (Redshift Usage doesn't supported this yet): | ||
|  | <p align="center"> | ||
|  |   <img width="70%"  src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-column-level-usage.png"/> | ||
|  | </p> | ||
|  | 
 | ||
|  | ## Additional Resources
 | ||
|  | 
 | ||
|  | ### Videos
 | ||
|  | 
 | ||
|  | **DataHub 101: Data Profiling and Usage Stats 101** | ||
|  | <p align="center"> | ||
|  | <iframe width="560" height="315" src="https://www.youtube.com/embed/d4S7RgWUg5U?start=254" title="DataHub 101: Data Profiling" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> | ||
|  | </p> | ||
|  | 
 | ||
|  | ### GraphQL
 | ||
|  | 
 | ||
|  | - <https://datahubproject.io/docs/graphql/objects#usageaggregationmetrics> | ||
|  | - <https://datahubproject.io/docs/graphql/objects#userusagecounts> | ||
|  | - <https://datahubproject.io/docs/graphql/objects#dashboardstatssummary> | ||
|  | - <https://datahubproject.io/docs/graphql/objects#dashboarduserusagecounts> | ||
|  | 
 | ||
|  | ## FAQ and Troubleshooting
 | ||
|  | 
 | ||
|  | ### Why is my Queries/Stats tab greyed out?
 | ||
|  | 
 | ||
|  | Queries/Stats tab is greyed out if there is no usage statistics for that dataset or there were no ingestion with usage extraction run before. | ||
|  | 
 | ||
|  | *Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!* |