mirror of
https://github.com/datahub-project/datahub.git
synced 2025-06-27 05:03:31 +00:00
11 KiB
11 KiB
Updating DataHub
This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version.
Next
Breaking Changes
- #6243 apache-ranger authorizer is no longer the core part of DataHub GMS, and it is shifted as plugin. Please refer updated documentation Configuring Authorization with Apache Ranger for configuring
apache-ranger-plugin
in DataHub GMS. - #6243 apache-ranger authorizer as plugin is not supported in DataHub Kubernetes deployment.
- #6243 Authentication and Authorization plugins configuration are removed from application.yml. Refer documentation Migration Of Plugins From application.yml for migrating any existing custom plugins.
datahub check graph-consistency
command has been removed. It was a beta API that we had considered but decided there are better solutions for this. So removing this.
Potential Downtime
Deprecations
Other notable Changes
- #6611 - Snowflake
schema_pattern
now accepts pattern for fully qualified schema name in format<catalog_name>.<schema_name>
by setting configmatch_fully_qualified_names : True
. Current defaultmatch_fully_qualified_names: False
is only to maintain backward compatibility. The config optionmatch_fully_qualified_names
will be deprecated in future and the default behavior will assumematch_fully_qualified_names: True
."
0.9.3
Breaking Changes
- The beta
datahub check graph-consistency
command has been removed.
Potential Downtime
Deprecations
- PowerBI source:
workspace_id_pattern
is introduced in place ofworkspace_id
.workspace_id
is now deprecated and set for removal in a future version.
Other notable Changes
0.9.2
- LookML source will only emit views that are reachable from explores while scanning your git repo. Previous behavior can be achieved by setting
emit_reachable_views_only
to False. - LookML source will always lowercase urns for lineage edges from views to upstream tables. There is no fallback provided to previous behavior because it was inconsistent in application of lower-casing earlier.
- dbt config
node_type_pattern
which was previously deprecated has been removed. Useentities_enabled
instead to control whether to emit metadata for sources, models, seeds, tests, etc. - The dbt source will always lowercase urns for lineage edges to the underlying data platform.
- The DataHub Airflow lineage backend and plugin no longer support Airflow 1.x. You can still run DataHub ingestion in Airflow 1.x using the PythonVirtualenvOperator.
Breaking Changes
Potential Downtime
Deprecations
Other notable Changes
0.9.1
Breaking Changes
- we have promoted
bigqery-beta
tobigquery
. If you are usingbigquery-beta
then change your recipes to use the typebigquery
Potential Downtime
Deprecations
Other notable Changes
0.9.0
Breaking Changes
- Java version 11 or greater is required.
- For any of the GraphQL search queries, the input no longer supports value but instead now accepts a list of values. These values represent an OR relationship where the field value must match any of the values.
Potential Downtime
Deprecations
Other notable Changes
v0.8.45
Breaking Changes
- The
getNativeUserInviteToken
andcreateNativeUserInviteToken
GraphQL endpoints have been renamed togetInviteToken
andcreateInviteToken
respectively. Additionally, both now accept an optionalroleUrn
parameter. Both endpoints also now require theMANAGE_POLICIES
privilege to execute, rather thanMANAGE_USER_CREDENTIALS
privilege. - One of the default policies shipped with DataHub (
urn:li:dataHubPolicy:7
, orAll Users - All Platform Privileges
) has been edited to no longer includeMANAGE_POLICIES
. Its name has consequently been changed toAll Users - All Platform Privileges (EXCEPT MANAGE POLICIES)
. This change was made to prevent all users from effectively acting as superusers by default.
Potential Downtime
Deprecations
Other notable Changes
v0.8.44
Breaking Changes
- Browse Paths have been upgraded to a new format to align more closely with the intention of the feature. Learn more about the changes, including steps on upgrading, here: https://datahubproject.io/docs/advanced/browse-paths-upgrade
- The dbt ingestion source's
disable_dbt_node_creation
andload_schema
options have been removed. They were no longer necessary due to the recently added sibling entities functionality. - The
snowflake
source now uses newer faster implementation (earliersnowflake-beta
). Config propertiesprovision_role
andcheck_role_grants
are not supported. Oldersnowflake
andsnowflake-usage
are available assnowflake-legacy
andsnowflake-usage-legacy
sources respectively.
Potential Downtime
- [Helm] If you're using Helm, please ensure that your version of the
datahub-actions
container is bumped tov0.0.7
orhead
. This version contains changes to support running ingestion in debug mode. Previous versions are not compatible with this release. Upgrading to helm chart version0.2.103
will ensure that you have the compatible versions by default.
Deprecations
Other notable Changes
v0.8.42
Breaking Changes
- Python 3.6 is no longer supported for metadata ingestion
- #5451
GMS_HOST
andGMS_PORT
environment variables deprecated inv0.8.39
have been removed. UseDATAHUB_GMS_HOST
andDATAHUB_GMS_PORT
instead. - #5478 DataHub CLI
delete
command when used with--hard
option will delete soft-deleted entities which match the other filters given. - #5471 Looker now populates
userEmail
in dashboard user usage stats. This version of looker connnector will not work with older version of datahub-gms if you haveextract_usage_history
looker config enabled. - #5529 -
ANALYTICS_ENABLED
environment variable in datahub-gms is now deprecated. UseDATAHUB_ANALYTICS_ENABLED
instead.
Potential Downtime
Deprecations
Other notable Changes
v0.8.41
Breaking Changes
-
The
should_overwrite
flag incsv-enricher
has been replaced withwrite_semantics
to match the format used for other sources. See the documentation for more details -
Closing an authorization hole in creating tags adding a Platform Privilege called
Create Tags
for creating tags. This is assigned todatahub
root user, along with default All Users policy. Notice: You may need to add this privilege (orManage Tags
) to existing users that need the ability to create tags on the platform. -
#5329 Below profiling config parameters are now supported in
BigQuery
:- profiling.profile_if_updated_since_days (default=1)
- profiling.profile_table_size_limit (default=1GB)
- profiling.profile_table_row_limit (default=50000)
Set above parameters to
null
if you want older behaviour.
Potential Downtime
Deprecations
Other notable Changes
v0.8.40
Breaking Changes
- #5240
lineage_client_project_id
inbigquery
source is removed. Usestorage_project_id
instead.
Potential Downtime
Deprecations
Other notable Changes
v0.8.39
Breaking Changes
- Refactored the
health
field of theDataset
GraphQL Type to be of type list of HealthStatus (was type HealthStatus). See this PR for more details.
Potential Downtime
Deprecations
- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
- #5208
GMS_HOST
andGMS_PORT
environment variables being set in various containers are deprecated in favour ofDATAHUB_GMS_HOST
andDATAHUB_GMS_PORT
. KAFKA_TOPIC_NAME
environment variable in datahub-mae-consumer and datahub-gms is now deprecated. UseMETADATA_AUDIT_EVENT_NAME
instead.KAFKA_MCE_TOPIC_NAME
environment variable in datahub-mce-consumer and datahub-gms is now deprecated. UseMETADATA_CHANGE_EVENT_NAME
instead.KAFKA_FMCE_TOPIC_NAME
environment variable in datahub-mce-consumer and datahub-gms is now deprecated. UseFAILED_METADATA_CHANGE_EVENT_NAME
instead.
Other notable Changes
- #5132 Profile tables in
snowflake
source only if they have been updated since configured (default:1
) number of day(s). Update the configprofiling.profile_if_updated_since_days
as per your profiling schedule or set it toNone
if you want older behaviour.
v0.8.38
Breaking Changes
Potential Downtime
Deprecations
Other notable Changes
- Create & Revoke Access Tokens via the UI
- Create and Manage new users via the UI
- Improvements to Business Glossary UI
- FIX - Do not require reindexing to migrate to using the UI business glossary
v0.8.36
Breaking Changes
- In this release we introduce a brand new Business Glossary experience. With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
Potential Downtime
Deprecations
Other notable Changes
- #4961 Dropped profiling is not reported by default as that caused a lot of spurious logging in some cases. Set
profiling.report_dropped_profiles
toTrue
if you want older behaviour.
v0.8.35
Breaking Changes
Potential Downtime
Deprecations
- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
Other notable Changes
v0.8.34
Breaking Changes
- #4644 Remove
database
option fromsnowflake
source which was deprecated sincev0.8.5
- #4595 Rename confusing config
report_upstream_lineage
toupstream_lineage_in_report
insnowflake
connector which was added in0.8.32
Potential Downtime
Deprecations
- #4644
host_port
option ofsnowflake
andsnowflake-usage
sources deprecated as the name was confusing. Useaccount_id
option instead.
Other notable Changes
- #4760
check_role_grants
option was added insnowflake
to disable checking roles insnowflake
as some people were reporting long run times when checking roles.