mirror of
				https://github.com/datahub-project/datahub.git
				synced 2025-10-22 06:24:56 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			138 lines
		
	
	
		
			5.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			138 lines
		
	
	
		
			5.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Browse Paths Upgrade (August 2022)
 | |
| 
 | |
| ## Background
 | |
| 
 | |
| Up to this point, there's been a historical constraint on all entity browse paths. Namely, each browse path has been
 | |
| required to end with a path component that represents "simple name" for an entity. For example, a Browse Path for a 
 | |
| Snowflake Table called "test_table" may look something like this:
 | |
| 
 | |
| ```
 | |
| /prod/snowflake/warehouse1/db1/test_table
 | |
| ```
 | |
| 
 | |
| In the UI, we artificially truncate the final path component when you are browsing the Entity hierarchy, so your browse experience 
 | |
| would be: 
 | |
| 
 | |
| `prod` > `snowflake` > `warehouse1`> `db1` > `Click Entity`
 | |
| 
 | |
| As you can see, the final path component `test_table` is effectively ignored. It could have any value, and we would still ignore
 | |
| it in the UI. This behavior serves as a workaround to the historical requirement that all browse paths end with a simple name. 
 | |
| 
 | |
| This data constraint stands in opposition the original intention of Browse Paths: to provide a simple mechanism for organizing
 | |
| assets into a hierarchical folder structure. For this reason, we've changed the semantics of Browse Paths to better align with the original intention. 
 | |
| Going forward, you will not be required to provide a final component detailing the "name". Instead, you will be able to provide a simpler path that
 | |
| omits this final component:
 | |
| 
 | |
| ```
 | |
| /prod/snowflake/warehouse1/db1
 | |
| ```
 | |
| 
 | |
| and the browse experience from the UI will continue to work as you would expect: 
 | |
| 
 | |
| `prod` > `snowflake` > `warehouse1`> `db1` > `Click Entity`. 
 | |
| 
 | |
| With this change comes a fix to a longstanding bug where multiple browse paths could not be attached to a single URN. Going forward,
 | |
| we will support producing multiple browse paths for the same entity, and allow you to traverse via multiple paths. For example
 | |
| 
 | |
| ```python
 | |
| browse_path = BrowsePathsClass(
 | |
|     paths=["/powerbi/my/custom/path", "/my/other/custom/path"]
 | |
| )
 | |
| return MetadataChangeProposalWrapper(
 | |
|     entityType="dataset",
 | |
|     changeType="UPSERT",
 | |
|     entityUrn="urn:li:dataset:(urn:li:dataPlatform:custom,MyFileName,PROD),
 | |
|     aspectName="browsePaths",
 | |
|     aspect=browse_path,
 | |
| )
 | |
| ```
 | |
| *Using the Python Emitter SDK to produce multiple Browse Paths for the same entity*
 | |
| 
 | |
| We've received multiple bug reports, such as [this issue](https://github.com/datahub-project/datahub/issues/5525), and requests to address these issues with Browse, and thus are deciding
 | |
| to do it now before more workarounds are created.  
 | |
| 
 | |
| ## What this means for you
 | |
| 
 | |
| Once you upgrade to DataHub `v0.8.45` you will immediately notice that traversing your Browse Path hierarchy will require
 | |
| one extra click to find the entity. This is because we are correctly displaying the FULL browse path, including the simple name mentioned above.
 | |
| 
 | |
| There will be 2 ways to upgrade to the new browse path format. Depending on your ingestion sources, you may want to use one or both:
 | |
| 
 | |
| 1. Migrate default browse paths to the new format by restarting DataHub
 | |
| 2. Upgrade your version of the `datahub` CLI to push new browse path format (version `v0.8.45`)
 | |
| 
 | |
| Each step will be discussed in detail below. 
 | |
| 
 | |
| ### 1. Migrating default browse paths to the new format
 | |
| 
 | |
| To migrate those Browse Paths that are generated by DataHub by default (when no path is provided), simply restart the `datahub-gms` container / pod with a single
 | |
| additional environment variable:
 | |
| 
 | |
| ```
 | |
| UPGRADE_DEFAULT_BROWSE_PATHS_ENABLED=true
 | |
| ```
 | |
| 
 | |
| And restart the `datahub-gms` instance. This will cause GMS to perform a boot-time migration of all your existing Browse Paths
 | |
| to the new format, removing the unnecessarily name component at the very end.
 | |
| 
 | |
| If the migration is successful, you'll see the following in your GMS logs: 
 | |
| 
 | |
| ```
 | |
| 18:58:17.414 [main] INFO c.l.m.b.s.UpgradeDefaultBrowsePathsStep:60 - Successfully upgraded all browse paths!
 | |
| ```
 | |
| 
 | |
| After this one-time migration is complete, you should be able to navigate the Browse hierarchy exactly as you did previously. 
 | |
| 
 | |
| > Note that certain ingestion sources actively produce their own Browse Paths, which overrides the default path
 | |
| > computed by DataHub. 
 | |
| > 
 | |
| > In these cases, getting the updated Browse Path will require re-running your ingestion process with the updated
 | |
| > version of the connector. This is discussed in more detail in the next section. 
 | |
| 
 | |
| ### 2. Upgrading the `datahub` CLI to push new browse paths 
 | |
| 
 | |
| If you are actively ingesting metadata from one or more of following sources
 | |
| 
 | |
| 1. Sagemaker
 | |
| 2. Looker / LookML
 | |
| 3. Feast
 | |
| 4. Kafka
 | |
| 5. Mode
 | |
| 6. PowerBi
 | |
| 7. Pulsar
 | |
| 8. Tableau
 | |
| 9. Business Glossary
 | |
| 
 | |
| You will need to upgrade the DataHub CLI to >= `v0.8.45` and re-run metadata ingestion. This will generate the new browse path format
 | |
| and overwrite the existing paths for entities that were extracted from these sources. 
 | |
| 
 | |
| ### If you are producing custom Browse Paths
 | |
| 
 | |
| If you've decided to produce your own custom Browse Paths to organize your assets (e.g. via the Python Emitter SDK), you'll want to change the code to produce those paths
 | |
| to truncate the final path component. For example, if you were previously emitting a browse path like this:
 | |
| 
 | |
| ```
 | |
| "my/custom/browse/path/suffix"
 | |
| ```
 | |
| 
 | |
| You can simply remove the final "suffix" piece:
 | |
| 
 | |
| ```
 | |
| "my/custom/browse/path"
 | |
| ```
 | |
| 
 | |
| Your users will be able to find the entity by traversing through these folders in the UI:
 | |
| 
 | |
| `my` > `custom` > `browse`> `path` > `Click Entity`.
 | |
| 
 | |
| 
 | |
| > Note that if you are using the Browse Path Transformer you *will* be impacted in the same way. It is recommended that you revisit the
 | |
| > paths that you are producing, and update them to the new format. 
 | |
| 
 | |
| ## Support
 | |
| 
 | |
| The Acryl team will be on standby to assist you in your migration. Please
 | |
| join [#release-0_8_0](https://datahubspace.slack.com/archives/C0244FHMHJQ) channel and reach out to us if you find
 | |
| trouble with the upgrade or have feedback on the process. We will work closely to make sure you can continue to operate
 | |
| DataHub smoothly.
 | 
