| 
									
										
										
										
											2020-08-24 18:30:05 -07:00
										 |  |  | - Start Date: 2020-08-03 | 
					
						
							|  |  |  | - RFC PR: https://github.com/linkedin/datahub/pull/1778 | 
					
						
							|  |  |  | - Implementation PR(s): https://github.com/linkedin/datahub/pull/1775 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | # Dashboards
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Summary
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Adding support for dashboards (and charts) metadata cataloging and enabling search & discovery for them. | 
					
						
							| 
									
										
										
										
											2021-03-05 00:12:12 -08:00
										 |  |  | The design should accommodate for different dashboarding ([Looker](https://looker.com), [Redash](https://redash.io/)) tools used within a company. | 
					
						
							| 
									
										
										
										
											2020-08-24 18:30:05 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Motivation
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Dashboards are a key piece within a data ecosystem of a company. They are used by different groups of employees across different organizations. | 
					
						
							|  |  |  | They provide a way to visualize some data assets (tracking datasets or metrics) by allowing slice and dicing of the input data source. | 
					
						
							|  |  |  | When a company scales, data assets including dashboards gets richer and bigger. Therefore, it's important to find and access to the right dashboard. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Goals
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | By having dashboards as a top-level entity in DataHub, we achieve below goals: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - Enabling Search & Discovery for dashboard assets by using dashboard metadata | 
					
						
							|  |  |  | - Link dashboards to underlying data sources and have a more complete picture of data lineage | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Non-goals
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | DataHub will only serve as a catalog for dashboards where users search dashboards by using keywords.  | 
					
						
							|  |  |  | Entity page for a dashboard might contain links to the dashboard to direct users to view the dashboard after finding it. | 
					
						
							|  |  |  | However, DataHub will not try to show the actual dashboard or any charts within that. This is not desired and shouldn't be allowed because: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  - Dashboards or charts within a dashboard might have different ACLs that prevent users without the necessary permission to display the dashboard.  | 
					
						
							|  |  |  |  Generally, the source of truth for these ACLs are dashboarding tools. | 
					
						
							|  |  |  |  - Underlying data sources might have some ACLs too. Again, the source of truth for these ACLs are specific data platforms. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Detailed design
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As shown in the above diagram, dashboards are composed of a collection of charts at a very high level. These charts | 
					
						
							|  |  |  | could be shared by different dashboards. In the example sketched above, `Chart_1`, `Chart_2` and `Chart_3` are part of | 
					
						
							|  |  |  | `Dashboard_A` and `Chart_3` and `Chart_4` are part of `Dashboard_B`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Entities
 | 
					
						
							|  |  |  | There will be 2 top level GMA [entities](../../../what/entity.md) in the design: dashboards and charts. | 
					
						
							|  |  |  | It's important to make charts as a top level entity because charts could be shared between different dashboards. | 
					
						
							|  |  |  | We'll need to build `Contains` relationships between Dashboard and Chart entities. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### URN Representation
 | 
					
						
							|  |  |  | We'll define two [URNs](../../../what/urn.md): `DashboardUrn` and `ChartUrn`. | 
					
						
							|  |  |  | These URNs should allow for unique identification for dashboards and charts even there are multiple dashboarding tools | 
					
						
							|  |  |  | are used within a company. Most of the time, dashboards & charts are given unique ids by the used dashboarding tool. | 
					
						
							|  |  |  | An example Dashboard URN for Looker will look like below: | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | urn:li:dashboard:(Looker,<<dashboard_id>>) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | An example Chart URN for Redash will look like below: | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | urn:li:chart:(Redash,<<chart_id>>) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Chart metadata
 | 
					
						
							|  |  |  | Dashboarding tools generally have different jargon to denote a chart. | 
					
						
							|  |  |  | They are called as [Look](https://docs.looker.com/exploring-data/saving-and-editing-looks) in Looker  | 
					
						
							|  |  |  | and [Visualization](https://redash.io/help/user-guide/visualizations/visualization-types) in Redash. | 
					
						
							|  |  |  | But, irrespective of the name, charts are the different tiles which exists in a dashboard. | 
					
						
							|  |  |  | Charts are mainly used for delivering some information visually to make it easily understandable. | 
					
						
							|  |  |  | They might be using single or multiple data sources and generally have an associated query running against | 
					
						
							|  |  |  | the underlying data source to generate the data that it will present. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Below is a list of metadata which can be associated with a chart: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - Title | 
					
						
							|  |  |  | - Description | 
					
						
							|  |  |  | - Type (Bar chart, Pie chart, Scatter plot etc.) | 
					
						
							|  |  |  | - Input sources | 
					
						
							|  |  |  | - Query (and its type) | 
					
						
							|  |  |  | - Access level (public, private etc.) | 
					
						
							|  |  |  | - Ownership | 
					
						
							|  |  |  | - Status (removed or not) | 
					
						
							|  |  |  | - Audit info (last modified, last refreshed) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Dashboard metadata
 | 
					
						
							|  |  |  | Aside from containing a set of charts, dashboards carry metadata attached to them. | 
					
						
							|  |  |  | Below is a list of metadata which can be associated with a dashboard: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - Title | 
					
						
							|  |  |  | - Description | 
					
						
							|  |  |  | - List of charts | 
					
						
							|  |  |  | - Access level (public, private etc.) | 
					
						
							|  |  |  | - Ownership | 
					
						
							|  |  |  | - Status (removed or not) | 
					
						
							|  |  |  | - Audit info (last modified, last refreshed) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Metadata graph
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | An example metadata graph showing complete data lineage picture is shown above. | 
					
						
							|  |  |  | In this picture, `Dash_A` and `Dash_B` are dashboards, and they are connected to charts through `Contains` edges. | 
					
						
							|  |  |  | `C1`, `C2`, `C3` and `C4` are charts, and they are connected to underlying datasets through `DownstreamOf` edges. | 
					
						
							|  |  |  | `D1`, `D2` and `D3` are datasets. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## How we teach this
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | We should create/update user guides to educate users for: | 
					
						
							|  |  |  |  - Search & discovery experience (how to find a dashboard in DataHub) | 
					
						
							|  |  |  |  - Lineage experience (how to find upstream datasets of a dashboard and how to find dashboards generated from a dataset) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Rollout / Adoption Strategy
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The design is supposed to be generic enough that any user of the DataHub should easily be able | 
					
						
							|  |  |  | to onboard their dashboard metadata to DataHub irrespective of their dashboarding platform. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Only thing users will need to do is to write an ETL script customized for their  | 
					
						
							|  |  |  | dashboarding platform (if it's not already provided in DataHub repo). This ETL script will: | 
					
						
							|  |  |  |  - Extract the metadata for all available dashboards and charts using the APIs of the dashboarding platform | 
					
						
							| 
									
										
										
										
											2021-03-05 00:12:12 -08:00
										 |  |  |  - Construct and emit this metadata in the form of [MCEs](../../../what/mxe.md) | 
					
						
							| 
									
										
										
										
											2020-08-24 18:30:05 -07:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Unresolved questions (To-do)
 | 
					
						
							|  |  |  |   | 
					
						
							|  |  |  | 1. We'll be adding social features like subscribe and follow later on. However, it's out of scope for this RFC. |