datahub/docs/how/updating-datahub.md

# Updating DataHub

<!--

## <version number>

### Breaking Changes

### Potential Downtime

### Deprecations

### Other Notable Changes

-->

This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version.

## Next

### Breaking Changes

- #9934 and #10075 - Stateful ingestion is now enabled by default if a `pipeline_name` is set and either a datahub-rest sink or `datahub_api` is specified. It will still be disabled by default when any other sink type is used or if there is no pipeline name set.
- #10002 - The `DataHubGraph` client no longer makes a request to the backend during initialization. If you want to preserve the old behavior, call `graph.test_connection()` after constructing the client.
- #10026 - The dbt `use_compiled_code` option has been removed, because we now support capturing both source and compiled dbt SQL. This can be configured using `include_compiled_code`, which will be default enabled in 0.13.1.
- #10055 - Assertion entities generated by dbt are now associated with the dbt dataset entity, and not the entity in the data warehouse.
- #10090 - For Redshift ingestion, `use_lineage_v2` is now enabled by default.

### Potential Downtime

### Deprecations

### Other Notable Changes

## 0.13.0

### Breaking Changes

- Updating MySQL version for quickstarts to 8.2, may cause quickstart issues for existing instances.
- Neo4j 5.x, may require migration from 4.x
- Build requires JDK17 (Runtime Java 11)
- Build requires Docker Compose > 2.20
- #9731 - The `acryl-datahub` CLI now requires Python 3.8+
- #9601 - The Unity Catalog(UC) ingestion source config `include_metastore` is now disabled by default. This change will affect the urns of all entities in the workspace.<br/>
  Entity Hierarchy with `include_metastore: true` (Old)

  ```
  - UC Metastore
    - Catalog
      - Schema
        - Table
  ```

  Entity Hierarchy with `include_metastore: false` (New)

  ```
  - Catalog
    - Schema
      - Table
  ```

  We recommend using `platform_instance` for differentiating across metastores.

  If stateful ingestion is enabled, running ingestion with latest cli version will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
  `datahub delete --platform databricks --soft` and then reingesting with latest cli version.

- #9601 - The Unity Catalog(UC) ingestion source config `include_hive_metastore` is now enabled by default. This requires config `warehouse_id` to be set. You can disable `include_hive_metastore` by setting it to `False` to avoid ingesting legacy hive metastore catalog in Databricks.
- #9904 - The default Redshift `table_lineage_mode` is now MIXED, instead of `STL_SCAN_BASED`. Improved lineage generation is also available by enabling `use_lineaege_v2`. This v2 implementation will become the default in a future release.

### Potential Downtime

### Deprecations

- Spark 2.x (including previous JDK8 build requirements)

### Other Notable Changes

## 0.12.1

### Breaking Changes

- #9244: The `redshift-legacy` and `redshift-legacy-usage` sources, which have been deprecated for >6 months, have been removed. The new `redshift` source is a superset of the functionality provided by those legacy sources.
- `database_alias` config is no longer supported in SQL sources namely - Redshift, MySQL, Oracle, Postgres, Trino, Presto-on-hive. The config will automatically be ignored if it's present in your recipe. It has been deprecated since v0.9.6.
- #9257: The Python SDK urn types are now autogenerated. The new classes are largely backwards compatible with the previous, manually written classes, but many older methods are now deprecated in favor of a more uniform interface. The only breaking change is that the signature for the director constructor e.g. `TagUrn("tag", ["tag_name"])` is no longer supported, and the simpler `TagUrn("tag_name")` should be used instead.
  The canonical place to import the urn classes from is `datahub.metadata.urns.*`. Other import paths, like `datahub.utilities.urns.corpuser_urn.CorpuserUrn` are retained for backwards compatibility, but are considered deprecated.
- #9286: The `DataHubRestEmitter.emit` method no longer returns anything. It previously returned a tuple of timestamps.
- #8951: A great expectations based profiler has been added for the Unity Catalog source.
  To use the old profiler, set `method: analyze` under the `profiling` section in your recipe.
  To use the new profiler, set `method: ge`. Profiling is disabled by default, so to enable it,
  one of these methods must be specified.

### Potential Downtime

### Deprecations

### Other Notable Changes

## 0.12.0

### Breaking Changes

- #8687 (datahub-helm #365 #353) - If Helm is used for installation and Neo4j is enabled, update the prerequisites Helm chart to version >=0.1.2 and adjust your value overrides in the `neo4j:` section according to the new structure.
- #9044 - GraphQL APIs for adding ownership now expect either an `ownershipTypeUrn` referencing a customer ownership type or a (deprecated) `type`. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use the `type` parameter which will get translated to a custom ownership type internally if one exists for the type being added.
- #9010 - In Redshift source's config `incremental_lineage` is set default to off.
- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8942 - Removed `urn:li:corpuser:datahub` owner for the `Measure`, `Dimension` and `Temporal` tags emitted
  by Looker and LookML source connectors.
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`.
- #8943 - The Unity Catalog ingestion source has a new option `include_metastore`, which will cause all urns to be changed when disabled.
  This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future.
  If stateful ingestion is enabled, simply setting `include_metastore: false` will perform all required cleanup.
  Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
  `datahub delete --platform databricks --soft` and then reingesting with `include_metastore: false`.
- #8846 - Changed enum values in resource filters used by policies. `RESOURCE_TYPE` became `TYPE` and `RESOURCE_URN` became `URN`.
  Any existing policies using these filters (i.e. defined for particular `urns` or `types` such as `dataset`) need to be upgraded
  manually, for example by retrieving their respective `dataHubPolicyInfo` aspect and changing part using filter i.e.

```yaml
   "resources": {
     "filter": {
       "criteria": [
         {
           "field": "RESOURCE_TYPE",
           "condition": "EQUALS",
           "values": [
             "dataset"
           ]
         }
       ]
     }
```

into

```yaml
   "resources": {
     "filter": {
       "criteria": [
         {
           "field": "TYPE",
           "condition": "EQUALS",
           "values": [
             "dataset"
           ]
         }
       ]
     }
```

for example, using `datahub put` command. Policies can be also removed and re-created via UI.

- #9077 - The BigQuery ingestion source by default sets `match_fully_qualified_names: true`.
  This means that any `dataset_pattern` or `schema_pattern` specified will be matched on the fully
  qualified dataset name, i.e. `<project_name>.<dataset_name>`. We attempt to support the old
  pattern format by prepending `.*\\.` to dataset patterns lacking a period, so in most cases this
  should not cause any issues. However, if you have a complex dataset pattern, we recommend you
  manually convert it to the fully qualified format to avoid any potential issues.
- #9110 - The Unity Catalog source will now generate urns based on `env` properly. If you have
  been setting `env` in your recipe to something besides `PROD`, we will now generate urns
  with that new env variable, invalidating your existing urns.

### Potential Downtime

### Deprecations

### Other Notable Changes

- Session token configuration has changed, all previously created session tokens will be invalid and users will be prompted to log in. Expiration time has also been shortened which may result in more login prompts with the default settings.
  There should be no other interruption due to this change.

## 0.11.0

### Breaking Changes

### Potential Downtime

- #8611 Search improvements requires reindexing indices. A `system-update` job will run which will set indices to read-only and create a backup/clone of each index. During the reindexing new components will be prevented from start-up until the reindex completes. The logs of this job will indicate a % complete per index. Depending on index sizes and infrastructure this process can take 5 minutes to hours however as a rough estimate 1 hour for every 2.3 million entities.

### Deprecations

- #8525: In LDAP ingestor, the `manager_pagination_enabled` changed to general `pagination_enabled`
- MAE Events are no longer produced. MAE events have been deprecated for over a year.

### Other Notable Changes

- In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.
- The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.
- In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.
- #8300: Clickhouse source now inherited from TwoTierSQLAlchemy. In old way we have platform_instance -> container -> co
  container db (None) -> container schema and now we have platform_instance -> container database.
- #8300: Added `uri_opts` argument; now we can add any options for clickhouse client.
- #8659: BigQuery ingestion no longer creates DataPlatformInstance aspects by default.
  This will only affect users that were depending on this aspect for custom functionality,
  and can be enabled via the `include_data_platform_instance` config option.
- OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.
- The CLI now supports recursive deletes.
- Batching of default aspects on initial ingestion (SQL)
- Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
- Gradle 7 upgrade moderately improves build speed
- DataHub Ingestion slim images reduced in size by 2GB+
- Glue Schema Registry fixed

## 0.10.5

### Breaking Changes

- #8201: Python SDK: In the DataFlow class, the `cluster` argument is deprecated in favor of `env`.
- #8263: Okta source config option `okta_profile_to_username_attr` default changed from `login` to `email`.
  This determines which Okta profile attribute is used for the corresponding DataHub user
  and thus may change what DataHub users are generated by the Okta source. And in a follow up `okta_profile_to_username_regex` has been set to `.*` which taken together with previous change brings the defaults in line with OIDC.
- #8331: For all sql-based sources that support profiling, you can no longer specify
  `profile_table_level_only` together with `include_field_xyz` config options to ingest
  certain column-level metrics. Instead, set `profile_table_level_only` to `false` and
  individually enable / disable desired field metrics.
- #8451: The `bigquery-beta` and `snowflake-beta` source aliases have been dropped. Use `bigquery` and `snowflake` as the source type instead.
- #8472: Ingestion runs created with Pipeline.create will show up in the DataHub ingestion tab as CLI-based runs. To revert to the previous behavior of not showing these runs in DataHub, pass `no_default_report=True`.
- #8513: `snowflake` connector will use user's `email` attribute as is in urn. To revert to previous behavior disable `email_as_user_identifier` in recipe.

### Potential Downtime

- BrowsePathsV2 upgrade will now be handled by the `system-update` job in non-blocking mode. This process generates data needed for the new search
  and browse feature. This process must complete before enabling the new search and browse UI and while upgrading entities will be missing from the UI.
  If not using the new search and browse UI, there will be no impact and the update will complete in the background.

### Deprecations

- #8198: In the Python SDK, the `PlatformKey` class has been renamed to `ContainerKey`.

### Other Notable Changes

0.10.5 introduces the new Unified Search & Browse experience and is disabled by default. You can control whether or not you want to see just the new search filtering experience, the new search and browse experience together, or keep the existing search and browse experiences by toggling the two environment variable feature flags `SHOW_SEARCH_FILTERS_V2` and `SHOW_BROWSE_V2` in your GMS container.

**Upgrade Considerations:**

- With the release of Browse V2, we have created a job to run in GMS that will backfill your existing data with new `browsePathsV2` aspects. This job loops over entity types that need a `browsePathsV2` aspect (Dataset, Dashboard, Chart, DataJob, DataFlow, MLModel, MLModelGroup, MLFeatureTable, and MLFeature) and generates one for them. For entities that may have Container parents (Datasets and Dashboards) we will try to fetch their parent containers in order to generate this new aspect. For those deployments with large amounts of data, consider whether running this upgrade job makes sense as it may be a heavy operation and take some time to complete. If you wish to skip this job, simply set the `BACKFILL_BROWSE_PATHS_V2` environment variable flag to `false` in your GMS container. Without this backfill job, though, you will need to rely on the newest CLI of ingestion to create these `browsePathsV2` aspects when running ingestion otherwise your browse sidebar will be out-of-sync.
- Since the new browse experience replaces the old, consider whether having the `SHOW_BROWSE_V2` environment variable feature flag on is the right decision for your organization. If you’re creating custom browse paths with the `browsePaths` aspect, you can continue to do the same with the new experience, however you will have to generate `browsePathsV2` aspects instead which are documented [here](https://datahubproject.io/docs/browsev2/browse-paths-v2/).

## 0.10.4

### Breaking Changes

### Potential Downtime

### Deprecations

- #8045: With the introduction of custom ownership types, the `Owner` aspect has been updated where the `type` field is deprecated in favor of a new field `typeUrn`. This latter field is an urn reference to the new OwnershipType entity. GraphQL endpoints have been updated to use the new field. For pre-existing ownership aspect records, DataHub now has logic to map the old field to the new field.

### Other notable Changes

- #8191: Updates GMS's health check endpoint to account for its dependency on external components. Notably, at this time, elasticsearch. This means that DataHub operators can now use GMS health status more reliably.

## 0.10.3

### Breaking Changes

- #7900: The `catalog_pattern` and `schema_pattern` options of the Unity Catalog source now match against the fully qualified name of the catalog/schema instead of just the name. Unless you're using regex `^` in your patterns, this should not affect you.
- #7942: Renaming the `containerPath` aspect to `browsePathsV2`. This means any data with the aspect name `containerPath` will be invalid. We had not exposed this in the UI or used it anywhere, but it was a model we recently merged to open up other work. This should not affect many people if anyone at all unless you were manually creating `containerPath` data through ingestion on your instance.
- #8068: In the `datahub delete` CLI, if an `--entity-type` filter is not specified, we automatically delete across all entity types. The previous behavior was to use a default entity type of dataset.
- #8068: In the `datahub delete` CLI, the `--start-time` and `--end-time` parameters are not required for timeseries aspect hard deletes. To recover the previous behavior of deleting all data, use `--start-time min --end-time max`.

### Potential Downtime

### Deprecations

- The signature of `Source.get_workunits()` is changed from `Iterable[WorkUnit]` to the more restrictive `Iterable[MetadataWorkUnit]`.
- Legacy usage creation via the `UsageAggregation` aspect, `/usageStats?action=batchIngest` GMS endpoint, and `UsageStatsWorkUnit` metadata-ingestion class are all deprecated.

### Other notable Changes

## 0.10.2

### Breaking Changes

- #7016 Add `add_database_name_to_urn` flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same.
- The Airflow plugin no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub-airflow-plugin[datahub-kafka]` for Kafka support.
- The Airflow lineage backend no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub[airflow,datahub-kafka]` for Kafka support.
- Java SDK PatchBuilders have been modified in a backwards incompatible way to align more with the Python SDK and support more use cases. Any application utilizing the Java SDK for patch building may be affected on upgrading this dependency.

### Deprecations

- The docker image and script for updating from Elasticsearch 6 to 7 is no longer being maintained and will be removed from the `/contrib` section of
  the repository. Please refer to older releases if needed.

## 0.10.0

### Breaking Changes

- #7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the `kafka-setup` docker image have been updated to be in-line with other DataHub components, for more info see our docs on [Configuring Kafka in DataHub
  ](https://datahubproject.io/docs/how/kafka-config). They have been suffixed with `_TOPIC` where as now the correct suffix is `_TOPIC_NAME`. This change should not affect any user who is using default Kafka names.
- #6906 The Redshift source has been reworked and now also includes usage capabilities. The old Redshift source was renamed to `redshift-legacy`. The `redshift-usage` source has also been renamed to `redshift-usage-legacy` will be removed in the future.

### Potential Downtime

- #6894 Search improvements requires reindexing indices. A `system-update` job will run which will set indices to read-only and create a backup/clone of each index. During the reindexing new components will be prevented from start-up until the reindex completes. The logs of this job will indicate a % complete per index. Depending on index sizes and infrastructure this process can take 5 minutes to hours however as a rough estimate 1 hour for every 2.3 million entities.

#### Helm Notes

Helm without `--atomic`: The default timeout for an upgrade command is 5 minutes. If the reindex takes longer (depending on data size) it will continue to run in the background even though helm will report a failure. Allow this job to finish and then re-run the helm upgrade command.

Helm with `--atomic`: In general, it is recommended to not use the `--atomic` setting for this particular upgrade since the system update job will be terminated before completion. If `--atomic` is preferred, then increase the timeout using the `--timeout` flag to account for the reindexing time (see note above for estimating this value).

### Deprecations

## 0.9.6

### Breaking Changes

- #6742 The metadata file sink's output format no longer contains nested JSON strings for MCP aspects, but instead unpacks the stringified JSON into a real JSON object. The previous sink behavior can be recovered using the `legacy_nested_json_string` option. The file source is backwards compatible and supports both formats.
- #6901 The `env` and `database_alias` fields have been marked deprecated across all sources. We recommend using `platform_instance` where possible instead.

### Potential Downtime

### Deprecations

- #6851 - Sources bigquery-legacy and bigquery-usage-legacy have been removed

### Other notable Changes

- If anyone faces issues with login please clear your cookies. Some security updates are part of this release. That may cause login issues until cookies are cleared.

## 0.9.4 / 0.9.5

### Breaking Changes

- #6243 apache-ranger authorizer is no longer the core part of DataHub GMS, and it is shifted as plugin. Please refer updated documentation [Configuring Authorization with Apache Ranger](./configuring-authorization-with-apache-ranger.md#configuring-your-datahub-deployment) for configuring `apache-ranger-plugin` in DataHub GMS.
- #6243 apache-ranger authorizer as plugin is not supported in DataHub Kubernetes deployment.
- #6243 Authentication and Authorization plugins configuration are removed from [application.yml](../../metadata-service/configuration/src/main/resources/application.yml). Refer documentation [Migration Of Plugins From application.yml](../plugins.md#migration-of-plugins-from-applicationyml) for migrating any existing custom plugins.
- `datahub check graph-consistency` command has been removed. It was a beta API that we had considered but decided there are better solutions for this. So removing this.
- `graphql_url` option of `powerbi-report-server` source deprecated as the options is not used.
- #6789 BigQuery ingestion: If `enable_legacy_sharded_table_support` is set to False, sharded table names will be suffixed with \_yyyymmdd to make sure they don't clash with non-sharded tables. This means if stateful ingestion is enabled then old sharded tables will be recreated with a new id and attached tags/glossary terms/etc will need to be added again. _This behavior is not enabled by default yet, but will be enabled by default in a future release._

### Potential Downtime

### Deprecations

### Other notable Changes

- #6611 - Snowflake `schema_pattern` now accepts pattern for fully qualified schema name in format `<catalog_name>.<schema_name>` by setting config `match_fully_qualified_names : True`. Current default `match_fully_qualified_names: False` is only to maintain backward compatibility. The config option `match_fully_qualified_names` will be deprecated in future and the default behavior will assume `match_fully_qualified_names: True`."
- #6636 - Sources `snowflake-legacy` and `snowflake-usage-legacy` have been removed.

## 0.9.3

### Breaking Changes

- The beta `datahub check graph-consistency` command has been removed.

### Potential Downtime

### Deprecations

- PowerBI source: `workspace_id_pattern` is introduced in place of `workspace_id`. `workspace_id` is now deprecated and set for removal in a future version.

### Other notable Changes

## 0.9.2

- LookML source will only emit views that are reachable from explores while scanning your git repo. Previous behavior can be achieved by setting `emit_reachable_views_only` to False.
- LookML source will always lowercase urns for lineage edges from views to upstream tables. There is no fallback provided to previous behavior because it was inconsistent in application of lower-casing earlier.
- dbt config `node_type_pattern` which was previously deprecated has been removed. Use `entities_enabled` instead to control whether to emit metadata for sources, models, seeds, tests, etc.
- The dbt source will always lowercase urns for lineage edges to the underlying data platform.
- The DataHub Airflow lineage backend and plugin no longer support Airflow 1.x. You can still run DataHub ingestion in Airflow 1.x using the [PythonVirtualenvOperator](https://airflow.apache.org/docs/apache-airflow/1.10.15/_api/airflow/operators/python_operator/index.html?highlight=pythonvirtualenvoperator#airflow.operators.python_operator.PythonVirtualenvOperator).

### Breaking Changes

- #6570 `snowflake` connector now populates created and last modified timestamps for snowflake datasets and containers. This version of snowflake connector will not work with **datahub-gms** version older than `v0.9.3`

### Potential Downtime

### Deprecations

### Other notable Changes

## 0.9.1

### Breaking Changes

- We have promoted `bigquery-beta` to `bigquery`. If you are using `bigquery-beta` then change your recipes to use the type `bigquery`.

### Potential Downtime

### Deprecations

### Other notable Changes

## 0.9.0

### Breaking Changes

- Java version 11 or greater is required.
- For any of the GraphQL search queries, the input no longer supports value but instead now accepts a list of values. These values represent an OR relationship where the field value must match any of the values.

### Potential Downtime

### Deprecations

### Other notable Changes

## `v0.8.45`

### Breaking Changes

- The `getNativeUserInviteToken` and `createNativeUserInviteToken` GraphQL endpoints have been renamed to
  `getInviteToken` and `createInviteToken` respectively. Additionally, both now accept an optional `roleUrn` parameter.
  Both endpoints also now require the `MANAGE_POLICIES` privilege to execute, rather than `MANAGE_USER_CREDENTIALS`
  privilege.
- One of the default policies shipped with DataHub (`urn:li:dataHubPolicy:7`, or `All Users - All Platform Privileges`)
  has been edited to no longer include `MANAGE_POLICIES`. Its name has consequently been changed to
  `All Users - All Platform Privileges (EXCEPT MANAGE POLICIES)`. This change was made to prevent all users from
  effectively acting as superusers by default.

### Potential Downtime

### Deprecations

### Other notable Changes

## `v0.8.44`

### Breaking Changes

- Browse Paths have been upgraded to a new format to align more closely with the intention of the feature.
  Learn more about the changes, including steps on upgrading, here: <https://datahubproject.io/docs/advanced/browse-paths-upgrade>
- The dbt ingestion source's `disable_dbt_node_creation` and `load_schema` options have been removed. They were no longer necessary due to the recently added sibling entities functionality.
- The `snowflake` source now uses newer faster implementation (earlier `snowflake-beta`). Config properties `provision_role` and `check_role_grants` are not supported. Older `snowflake` and `snowflake-usage` are available as `snowflake-legacy` and `snowflake-usage-legacy` sources respectively.

### Potential Downtime

- [Helm] If you're using Helm, please ensure that your version of the `datahub-actions` container is bumped to `v0.0.7` or `head`.
  This version contains changes to support running ingestion in debug mode. Previous versions are not compatible with this release.
  Upgrading to helm chart version `0.2.103` will ensure that you have the compatible versions by default.

### Deprecations

### Other notable Changes

## `v0.8.42`

### Breaking Changes

- Python 3.6 is no longer supported for metadata ingestion
- #5451 `GMS_HOST` and `GMS_PORT` environment variables deprecated in `v0.8.39` have been removed. Use `DATAHUB_GMS_HOST` and `DATAHUB_GMS_PORT` instead.
- #5478 DataHub CLI `delete` command when used with `--hard` option will delete soft-deleted entities which match the other filters given.
- #5471 Looker now populates `userEmail` in dashboard user usage stats. This version of looker connnector will not work with older version of **datahub-gms** if you have `extract_usage_history` looker config enabled.
- #5529 - `ANALYTICS_ENABLED` environment variable in **datahub-gms** is now deprecated. Use `DATAHUB_ANALYTICS_ENABLED` instead.
- #5485 `--include-removed` option was removed from delete CLI

### Potential Downtime

### Deprecations

### Other notable Changes

## `v0.8.41`

### Breaking Changes

- The `should_overwrite` flag in `csv-enricher` has been replaced with `write_semantics` to match the format used for other sources. See the [documentation](https://datahubproject.io/docs/generated/ingestion/sources/csv/) for more details
- Closing an authorization hole in creating tags adding a Platform Privilege called `Create Tags` for creating tags. This is assigned to `datahub` root user, along
  with default All Users policy. Notice: You may need to add this privilege (or `Manage Tags`) to existing users that need the ability to create tags on the platform.
- #5329 Below profiling config parameters are now supported in `BigQuery`:

  - profiling.profile_if_updated_since_days (default=1)
  - profiling.profile_table_size_limit (default=1GB)
  - profiling.profile_table_row_limit (default=50000)

  Set above parameters to `null` if you want older behaviour.

### Potential Downtime

### Deprecations

### Other notable Changes

## `v0.8.40`

### Breaking Changes

- #5240 `lineage_client_project_id` in `bigquery` source is removed. Use `storage_project_id` instead.

### Potential Downtime

### Deprecations

### Other notable Changes

## `v0.8.39`

### Breaking Changes

- Refactored the `health` field of the `Dataset` GraphQL Type to be of type **list of HealthStatus** (was type **HealthStatus**). See [this PR](https://github.com/datahub-project/datahub/pull/5222/files) for more details.

### Potential Downtime

### Deprecations

- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
- #5208 `GMS_HOST` and `GMS_PORT` environment variables being set in various containers are deprecated in favour of `DATAHUB_GMS_HOST` and `DATAHUB_GMS_PORT`.
- `KAFKA_TOPIC_NAME` environment variable in **datahub-mae-consumer** and **datahub-gms** is now deprecated. Use `METADATA_AUDIT_EVENT_NAME` instead.
- `KAFKA_MCE_TOPIC_NAME` environment variable in **datahub-mce-consumer** and **datahub-gms** is now deprecated. Use `METADATA_CHANGE_EVENT_NAME` instead.
- `KAFKA_FMCE_TOPIC_NAME` environment variable in **datahub-mce-consumer** and **datahub-gms** is now deprecated. Use `FAILED_METADATA_CHANGE_EVENT_NAME` instead.

### Other notable Changes

- #5132 Profile tables in `snowflake` source only if they have been updated since configured (default: `1`) number of day(s). Update the config `profiling.profile_if_updated_since_days` as per your profiling schedule or set it to `None` if you want older behaviour.

## `v0.8.38`

### Breaking Changes

### Potential Downtime

### Deprecations

### Other notable Changes

- Create & Revoke Access Tokens via the UI
- Create and Manage new users via the UI
- Improvements to Business Glossary UI
- FIX - Do not require reindexing to migrate to using the UI business glossary

## `v0.8.36`

### Breaking Changes

- In this release we introduce a brand new Business Glossary experience. With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to [restore your indices](https://datahubproject.io/docs/how/restore-indices/) in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!

### Potential Downtime

### Deprecations

### Other notable Changes

- #4961 Dropped profiling is not reported by default as that caused a lot of spurious logging in some cases. Set `profiling.report_dropped_profiles` to `True` if you want older behaviour.

## `v0.8.35`

### Breaking Changes

### Potential Downtime

### Deprecations

- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.

### Other notable Changes

## `v0.8.34`

### Breaking Changes

- #4644 Remove `database` option from `snowflake` source which was deprecated since `v0.8.5`
- #4595 Rename confusing config `report_upstream_lineage` to `upstream_lineage_in_report` in `snowflake` connector which was added in `0.8.32`

### Potential Downtime

### Deprecations

- #4644 `host_port` option of `snowflake` and `snowflake-usage` sources deprecated as the name was confusing. Use `account_id` option instead.

### Other notable Changes

- #4760 `check_role_grants` option was added in `snowflake` to disable checking roles in `snowflake` as some people were reporting long run times when checking roles.
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
+								# Updating DataHub
-												feat(ingest): add DataHubGraph.emit_all method (#10002)


											
										
										
											2024-03-11 16:36:18 -07:00
+								<!--
 								## <version number>
 								### Breaking Changes
 								### Potential Downtime
 								### Deprecations
 								### Other Notable Changes
 								-->
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
+								This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version.
 								## Next
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs: Update updating-datahub.md (#9131)


											
										
										
											2023-10-30 20:50:42 +00:00
+								### Breaking Changes
-												fix(ingest): only auto-enable stateful ingestion if pipeline name is set (#10075)


											
										
										
											2024-03-18 13:59:01 -07:00
+								- #9934 and #10075 - Stateful ingestion is now enabled by default if a `pipeline_name` is set and either a datahub-rest sink or `datahub_api` is specified. It will still be disabled by default when any other sink type is used or if there is no pipeline name set.
-												feat(ingest): add DataHubGraph.emit_all method (#10002)


											
										
										
											2024-03-11 16:36:18 -07:00
+								- #10002 - The `DataHubGraph` client no longer makes a request to the backend during initialization. If you want to preserve the old behavior, call `graph.test_connection()` after constructing the client.
-												feat(dbt): show source and compiled code in the UI (#10028)


											
										
										
											2024-03-18 18:16:49 -07:00
+								- #10026 - The dbt `use_compiled_code` option has been removed, because we now support capturing both source and compiled dbt SQL. This can be configured using `include_compiled_code`, which will be default enabled in 0.13.1.
-												feat(ingest/dbt): point dbt assertions at dbt nodes (#10055)


											
										
										
											2024-03-18 18:13:01 -07:00
+								- #10055 - Assertion entities generated by dbt are now associated with the dbt dataset entity, and not the entity in the data warehouse.
-												feat(ingest): emit platform for query entities (#10103)


											
										
										
											2024-03-26 11:22:53 -07:00
+								- #10090 - For Redshift ingestion, `use_lineage_v2` is now enabled by default.
-												feat(ingest): add DataHubGraph.emit_all method (#10002)


											
										
										
											2024-03-11 16:36:18 -07:00
 								### Potential Downtime
 								### Deprecations
 								### Other Notable Changes
 								## 0.13.0
 								### Breaking Changes
-												feat(mysql): upgrade to version 8.2 for quickstart (#9241)

Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-11-22 13:54:12 -06:00
+								- Updating MySQL version for quickstarts to 8.2, may cause quickstart issues for existing instances.
-												feat(build): gradle 8, jdk17, neo4j 5 (#9458)


											
										
										
											2023-12-15 13:28:33 -06:00
+								- Neo4j 5.x, may require migration from 4.x
-												feat(docker): docker compose profiles updates (#9514)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-03 15:58:50 -06:00
+								- Build requires JDK17 (Runtime Java 11)
 								- Build requires Docker Compose > 2.20
-												chore(cli): drop support for python 3.7 (#9731)


											
										
										
											2024-01-29 10:50:47 -08:00
+								- #9731 - The `acryl-datahub` CLI now requires Python 3.8+
-												feat(ingest/databricks): ingest hive metastore by default, more docs (#9601)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-19 03:56:33 +05:30
+								- #9601 - The Unity Catalog(UC) ingestion source config `include_metastore` is now disabled by default. This change will affect the urns of all entities in the workspace.<br/>
-												chore(cli): drop support for python 3.7 (#9731)


											
										
										
											2024-01-29 10:50:47 -08:00
+								  Entity Hierarchy with `include_metastore: true` (Old)
-												feat(ingest/databricks): ingest hive metastore by default, more docs (#9601)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-19 03:56:33 +05:30
+								  ```
 								  - UC Metastore
 								    - Catalog
 								      - Schema
 								        - Table
 								  ```
-												chore(cli): drop support for python 3.7 (#9731)


											
										
										
											2024-01-29 10:50:47 -08:00
+								  Entity Hierarchy with `include_metastore: false` (New)
-												feat(ingest/databricks): ingest hive metastore by default, more docs (#9601)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-19 03:56:33 +05:30
+								  ```
 								  - Catalog
 								    - Schema
 								      - Table
 								  ```
-												chore(cli): drop support for python 3.7 (#9731)


											
										
										
											2024-01-29 10:50:47 -08:00
-												feat(ingest/databricks): ingest hive metastore by default, more docs (#9601)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-19 03:56:33 +05:30
+								  We recommend using `platform_instance` for differentiating across metastores.
 								  If stateful ingestion is enabled, running ingestion with latest cli version will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
-												chore(cli): drop support for python 3.7 (#9731)


											
										
										
											2024-01-29 10:50:47 -08:00
+								  `datahub delete --platform databricks --soft` and then reingesting with latest cli version.
-												feat(ingest/databricks): ingest hive metastore by default, more docs (#9601)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2024-01-19 03:56:33 +05:30
+								- #9601 - The Unity Catalog(UC) ingestion source config `include_hive_metastore` is now enabled by default. This requires config `warehouse_id` to be set. You can disable `include_hive_metastore` by setting it to `False` to avoid ingesting legacy hive metastore catalog in Databricks.
-												feat(ingest/redshift): redshift lineage v2 (#9904)


											
										
										
											2024-02-23 16:32:51 -08:00
+								- #9904 - The default Redshift `table_lineage_mode` is now MIXED, instead of `STL_SCAN_BASED`. Improved lineage generation is also available by enabling `use_lineaege_v2`. This v2 implementation will become the default in a future release.
-												docs(updating-datahub): update docs for v0.12.1 (#9441)


											
										
										
											2023-12-12 12:16:27 -06:00
 								### Potential Downtime
 								### Deprecations
-												feat(build): gradle 8, jdk17, neo4j 5 (#9458)


											
										
										
											2023-12-15 13:28:33 -06:00
+								- Spark 2.x (including previous JDK8 build requirements)
-												docs(updating-datahub): update docs for v0.12.1 (#9441)


											
										
										
											2023-12-12 12:16:27 -06:00
+								### Other Notable Changes
 								## 0.12.1
 								### Breaking Changes
-												fix(ingest): drop redshift-legacy and redshift-usage-legacy sources (#9244)


											
										
										
											2023-11-16 13:33:35 -05:00
+								- #9244: The `redshift-legacy` and `redshift-legacy-usage` sources, which have been deprecated for >6 months, have been removed. The new `redshift` source is a superset of the functionality provided by those legacy sources.
-												fix(ingest): drop deprecated database_alias from sql sources (#9299)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2023-11-29 02:19:49 +05:30
+								- `database_alias` config is no longer supported in SQL sources namely - Redshift, MySQL, Oracle, Postgres, Trino, Presto-on-hive. The config will automatically be ignored if it's present in your recipe. It has been deprecated since v0.9.6.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								- #9257: The Python SDK urn types are now autogenerated. The new classes are largely backwards compatible with the previous, manually written classes, but many older methods are now deprecated in favor of a more uniform interface. The only breaking change is that the signature for the director constructor e.g. `TagUrn("tag", ["tag_name"])` is no longer supported, and the simpler `TagUrn("tag_name")` should be used instead.
 								  The canonical place to import the urn classes from is `datahub.metadata.urns.*`. Other import paths, like `datahub.utilities.urns.corpuser_urn.CorpuserUrn` are retained for backwards compatibility, but are considered deprecated.
-												feat(ingest): clean up DataHubRestEmitter return type (#9286)

Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
											
										
										
											2023-11-30 21:00:43 -05:00
+								- #9286: The `DataHubRestEmitter.emit` method no longer returns anything. It previously returned a tuple of timestamps.
-												feat(ingest/unity): GE Profiling (#8951)


											
										
										
											2023-12-06 13:59:23 -05:00
+								- #8951: A great expectations based profiler has been added for the Unity Catalog source.
-												docs(updating-datahub): update docs for v0.12.1 (#9441)


											
										
										
											2023-12-12 12:16:27 -06:00
+								  To use the old profiler, set `method: analyze` under the `profiling` section in your recipe.
 								  To use the new profiler, set `method: ge`. Profiling is disabled by default, so to enable it,
 								  one of these methods must be specified.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs: Update updating-datahub.md (#9131)


											
										
										
											2023-10-30 20:50:42 +00:00
+								### Potential Downtime
 								### Deprecations
 								### Other Notable Changes
 								## 0.12.0
-												feat(ingestion/redshift): support auto_incremental_lineage (#9010)


											
										
										
											2023-10-25 15:26:06 +05:30
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								### Breaking Changes
-												feat(ingest/airflow): airflow plugin v2 (#8853)


											
										
										
											2023-10-04 06:53:15 -04:00
-												fix(metadata-io): in Neo4j service use proper algorithm to get lineage (#8687)

Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-11-10 17:58:38 +01:00
+								- #8687 (datahub-helm #365 #353) - If Helm is used for installation and Neo4j is enabled, update the prerequisites Helm chart to version >=0.1.2 and adjust your value overrides in the `neo4j:` section according to the new structure.
-												docs: Update updating-datahub.md (#9131)


											
										
										
											2023-10-30 20:50:42 +00:00
+								- #9044 - GraphQL APIs for adding ownership now expect either an `ownershipTypeUrn` referencing a customer ownership type or a (deprecated) `type`. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use the `type` parameter which will get translated to a custom ownership type internally if one exists for the type being added.
 								- #9010 - In Redshift source's config `incremental_lineage` is set default to off.
-												build(ingest): upgrade to sqlalchemy 1.4, drop 1.3 support (#8810)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2023-09-13 00:00:24 +05:30
+								- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								- #8942 - Removed `urn:li:corpuser:datahub` owner for the `Measure`, `Dimension` and `Temporal` tags emitted
-												fix(ingest/looker): stop emitting tag owner (#8942)


											
										
										
											2023-10-12 02:06:19 +02:00
+								  by Looker and LookML source connectors.
-												feat(ingest/airflow): airflow plugin v2 (#8853)


											
										
										
											2023-10-04 06:53:15 -04:00
+								- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
 								- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`.
-												docs(ingest/bigquery): Add docs for breaking change: match_fully_qualified_names (#9094)


											
										
										
											2023-10-24 18:56:14 -04:00
+								- #8943 - The Unity Catalog ingestion source has a new option `include_metastore`, which will cause all urns to be changed when disabled.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								  This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future.
 								  If stateful ingestion is enabled, simply setting `include_metastore: false` will perform all required cleanup.
 								  Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
 								  `datahub delete --platform databricks --soft` and then reingesting with `include_metastore: false`.
-												docs(update): Added info on breaking change for policies (#9093)

Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-10-25 00:58:56 +02:00
+								- #8846 - Changed enum values in resource filters used by policies. `RESOURCE_TYPE` became `TYPE` and `RESOURCE_URN` became `URN`.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								  Any existing policies using these filters (i.e. defined for particular `urns` or `types` such as `dataset`) need to be upgraded
 								  manually, for example by retrieving their respective `dataHubPolicyInfo` aspect and changing part using filter i.e.
-												docs(update): Added info on breaking change for policies (#9093)

Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-10-25 00:58:56 +02:00
+								```yaml
 								   "resources": {
 								     "filter": {
 								       "criteria": [
 								         {
 								           "field": "RESOURCE_TYPE",
 								           "condition": "EQUALS",
 								           "values": [
 								             "dataset"
 								           ]
 								         }
 								       ]
 								     }
 								```
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(update): Added info on breaking change for policies (#9093)

Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-10-25 00:58:56 +02:00
+								into
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(update): Added info on breaking change for policies (#9093)

Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-10-25 00:58:56 +02:00
+								```yaml
 								   "resources": {
 								     "filter": {
 								       "criteria": [
 								         {
 								           "field": "TYPE",
 								           "condition": "EQUALS",
 								           "values": [
 								             "dataset"
 								           ]
 								         }
 								       ]
 								     }
 								```
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(update): Added info on breaking change for policies (#9093)

Co-authored-by: Pedro Silva <pedro@acryl.io>
											
										
										
											2023-10-25 00:58:56 +02:00
+								for example, using `datahub put` command. Policies can be also removed and re-created via UI.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(ingest/bigquery): Add docs for breaking change: match_fully_qualified_names (#9094)


											
										
										
											2023-10-24 18:56:14 -04:00
+								- #9077 - The BigQuery ingestion source by default sets `match_fully_qualified_names: true`.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								  This means that any `dataset_pattern` or `schema_pattern` specified will be matched on the fully
 								  qualified dataset name, i.e. `<project_name>.<dataset_name>`. We attempt to support the old
 								  pattern format by prepending `.*\\.` to dataset patterns lacking a period, so in most cases this
 								  should not cause any issues. However, if you have a complex dataset pattern, we recommend you
 								  manually convert it to the fully qualified format to avoid any potential issues.
-												feat(ingest/unity): Support specifying catalogs directly; pass env correctly (#9110)


											
										
										
											2023-11-16 12:41:12 -05:00
+								- #9110 - The Unity Catalog source will now generate urns based on `env` properly. If you have
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								  been setting `env` in your recipe to something besides `PROD`, we will now generate urns
 								  with that new env variable, invalidating your existing urns.
-												feat(ingest): unbundle airflow plugin emitter dependencies (#7493)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2023-03-07 12:07:42 -05:00
-												docs(release): Update updating-datahub.md for 0.10.5 release (#8557)


Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-08-03 00:00:12 -03:00
+								### Potential Downtime
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
+								### Deprecations
 								### Other Notable Changes
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												fix(frontend): update cookie module (#8862)


											
										
										
											2023-10-17 15:50:32 -05:00
+								- Session token configuration has changed, all previously created session tokens will be invalid and users will be prompted to log in. Expiration time has also been shortened which may result in more login prompts with the default settings.
 								  There should be no other interruption due to this change.
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
 								## 0.11.0
 								### Breaking Changes
 								### Potential Downtime
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
+								- #8611 Search improvements requires reindexing indices. A `system-update` job will run which will set indices to read-only and create a backup/clone of each index. During the reindexing new components will be prevented from start-up until the reindex completes. The logs of this job will indicate a % complete per index. Depending on index sizes and infrastructure this process can take 5 minutes to hours however as a rough estimate 1 hour for every 2.3 million entities.
-												docs(release): Update updating-datahub.md for 0.10.5 release (#8557)


Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-08-03 00:00:12 -03:00
+								### Deprecations
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												Feat(ingest/ldap)fix list index out of range error (#8525)

Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
											
										
										
											2023-08-09 21:13:27 +04:00
+								- #8525: In LDAP ingestor, the `manager_pagination_enabled` changed to general `pagination_enabled`
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
+								- MAE Events are no longer produced. MAE events have been deprecated for over a year.
-												docs(release): Update updating-datahub.md for 0.10.5 release (#8557)


Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-08-03 00:00:12 -03:00
-												docs(updating): add details on Unified Search & Browse experience (#8568)

Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
											
										
										
											2023-08-03 21:19:43 -05:00
+								### Other Notable Changes
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
+								- In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.
 								- The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.
 								- In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.
-												Fix(ingestion/clickhouse) move to two tier sqlalchemy (#8300)

Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
											
										
										
											2023-08-12 00:11:40 +04:00
+								- #8300: Clickhouse source now inherited from TwoTierSQLAlchemy. In old way we have platform_instance -> container -> co
 								  container db (None) -> container schema and now we have platform_instance -> container database.
 								- #8300: Added `uri_opts` argument; now we can add any options for clickhouse client.
-												fix(ingest/bigquery): Add config option to create DataPlatformInstance, default off (#8659)


											
										
										
											2023-08-24 05:16:06 -04:00
+								- #8659: BigQuery ingestion no longer creates DataPlatformInstance aspects by default.
 								  This will only affect users that were depending on this aspect for custom functionality,
 								  and can be enabled via the `include_data_platform_instance` config option.
-												docs(release): Update updating-datahub.md for 0.11.0 release (#8821)

Co-authored-by: Indy Prentice <indy@ip-192-168-3-10.us-west-2.compute.internal>
											
										
										
											2023-09-11 17:33:03 -03:00
+								- OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.
 								- The CLI now supports recursive deletes.
 								- Batching of default aspects on initial ingestion (SQL)
 								- Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
 								- Gradle 7 upgrade moderately improves build speed
 								- DataHub Ingestion slim images reduced in size by 2GB+
 								- Glue Schema Registry fixed
-												docs(release): Update updating-datahub.md for 0.10.5 release (#8557)


Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-08-03 00:00:12 -03:00
 								## 0.10.5
 								### Breaking Changes
-												fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead (#8201)

Co-authored-by: mohdsiddique <mohdsiddiquebagwan@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2023-06-16 02:36:28 +05:30
+								- #8201: Python SDK: In the DataFlow class, the `cluster` argument is deprecated in favor of `env`.
-												fix(ingest/okta): Set default of okta_profile_to_username_attr to email (#8263)


											
										
										
											2023-06-21 04:08:59 -04:00
+								- #8263: Okta source config option `okta_profile_to_username_attr` default changed from `login` to `email`.
 								  This determines which Okta profile attribute is used for the corresponding DataHub user
-												fix(ingest/okta): Set default of okta connector to match OIDC defaults (#8272)


											
										
										
											2023-06-21 19:15:31 +05:30
+								  and thus may change what DataHub users are generated by the Okta source. And in a follow up `okta_profile_to_username_regex` has been set to `.*` which taken together with previous change brings the defaults in line with OIDC.
-												fix(ingest/sql-common): Fix profile_table_level_only (#8331)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2023-07-07 19:05:50 -04:00
+								- #8331: For all sql-based sources that support profiling, you can no longer specify
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								  `profile_table_level_only` together with `include_field_xyz` config options to ingest
 								  certain column-level metrics. Instead, set `profile_table_level_only` to `false` and
 								  individually enable / disable desired field metrics.
-												chore(ingest): drop bigquery-beta and snowflake-beta aliases (#8451)


											
										
										
											2023-07-20 11:05:25 -07:00
+								- #8451: The `bigquery-beta` and `snowflake-beta` source aliases have been dropped. Use `bigquery` and `snowflake` as the source type instead.
-												feat(ingest): enable pipeline reporting by default (#8472)


											
										
										
											2023-07-25 01:46:27 -07:00
+								- #8472: Ingestion runs created with Pipeline.create will show up in the DataHub ingestion tab as CLI-based runs. To revert to the previous behavior of not showing these runs in DataHub, pass `no_default_report=True`.
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
+								- #8513: `snowflake` connector will use user's `email` attribute as is in urn. To revert to previous behavior disable `email_as_user_identifier` in recipe.
-												Update updating-datahub.md for v0.10.3 release (#8139)


											
										
										
											2023-05-26 13:01:08 -05:00
+								### Potential Downtime
-												docs(release): Update updating-datahub.md for 0.10.5 release (#8557)


Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
											
										
										
											2023-08-03 00:00:12 -03:00
+								- BrowsePathsV2 upgrade will now be handled by the `system-update` job in non-blocking mode. This process generates data needed for the new search
 								  and browse feature. This process must complete before enabling the new search and browse UI and while upgrading entities will be missing from the UI.
 								  If not using the new search and browse UI, there will be no impact and the update will complete in the background.
-												Update updating-datahub.md for v0.10.3 release (#8139)


											
										
										
											2023-05-26 13:01:08 -05:00
+								### Deprecations
-												feat(sdk): easily generate container urns (#8198)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
											
										
										
											2023-07-21 14:14:06 -07:00
+								- #8198: In the Python SDK, the `PlatformKey` class has been renamed to `ContainerKey`.
-												docs(updating): add details on Unified Search & Browse experience (#8568)

Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
											
										
										
											2023-08-03 21:19:43 -05:00
+								### Other Notable Changes
 .10.5 introduces the new Unified Search & Browse experience and is disabled by default. You can control whether or not you want to see just the new search filtering experience, the new search and browse experience together, or keep the existing search and browse experiences by toggling the two environment variable feature flags `SHOW_SEARCH_FILTERS_V2` and `SHOW_BROWSE_V2` in your GMS container.
 								**Upgrade Considerations:**
 								- With the release of Browse V2, we have created a job to run in GMS that will backfill your existing data with new `browsePathsV2` aspects. This job loops over entity types that need a `browsePathsV2` aspect (Dataset, Dashboard, Chart, DataJob, DataFlow, MLModel, MLModelGroup, MLFeatureTable, and MLFeature) and generates one for them. For entities that may have Container parents (Datasets and Dashboards) we will try to fetch their parent containers in order to generate this new aspect. For those deployments with large amounts of data, consider whether running this upgrade job makes sense as it may be a heavy operation and take some time to complete. If you wish to skip this job, simply set the `BACKFILL_BROWSE_PATHS_V2` environment variable flag to `false` in your GMS container. Without this backfill job, though, you will need to rely on the newest CLI of ingestion to create these `browsePathsV2` aspects when running ingestion otherwise your browse sidebar will be out-of-sync.
 								- Since the new browse experience replaces the old, consider whether having the `SHOW_BROWSE_V2` environment variable feature flag on is the right decision for your organization. If you’re creating custom browse paths with the `browsePaths` aspect, you can continue to do the same with the new experience, however you will have to generate `browsePathsV2` aspects instead which are documented [here](https://datahubproject.io/docs/browsev2/browse-paths-v2/).
-												Update updating-datahub.md for v0.10.3 release (#8139)


											
										
										
											2023-05-26 13:01:08 -05:00
-												chore(release): update datahub upgrade docs (#8228)


											
										
										
											2023-06-15 12:08:01 +01:00
+								## 0.10.4
 								### Breaking Changes
 								### Potential Downtime
 								### Deprecations
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												chore(release): update datahub upgrade docs (#8228)


											
										
										
											2023-06-15 12:08:01 +01:00
+								- #8045: With the introduction of custom ownership types, the `Owner` aspect has been updated where the `type` field is deprecated in favor of a new field `typeUrn`. This latter field is an urn reference to the new OwnershipType entity. GraphQL endpoints have been updated to use the new field. For pre-existing ownership aspect records, DataHub now has logic to map the old field to the new field.
 								### Other notable Changes
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												chore(release): update datahub upgrade docs (#8228)


											
										
										
											2023-06-15 12:08:01 +01:00
+								- #8191: Updates GMS's health check endpoint to account for its dependency on external components. Notably, at this time, elasticsearch. This means that DataHub operators can now use GMS health status more reliably.
-												Update updating-datahub.md for v0.10.3 release (#8139)


											
										
										
											2023-05-26 13:01:08 -05:00
+								## 0.10.3
 								### Breaking Changes
-												fix(ingest/unity): use fully qualified catalog/schema patterns (#7900)


											
										
										
											2023-05-03 04:57:17 +05:30
+								- #7900: The `catalog_pattern` and `schema_pattern` options of the Unity Catalog source now match against the fully qualified name of the catalog/schema instead of just the name. Unless you're using regex `^` in your patterns, this should not affect you.
-												docs(): Update updating-datahub.md with breaking changes (#7964)


											
										
										
											2023-05-23 16:13:22 -04:00
+								- #7942: Renaming the `containerPath` aspect to `browsePathsV2`. This means any data with the aspect name `containerPath` will be invalid. We had not exposed this in the UI or used it anywhere, but it was a model we recently merged to open up other work. This should not affect many people if anyone at all unless you were manually creating `containerPath` data through ingestion on your instance.
-												feat(cli): delete cli v2 (#8068)


											
										
										
											2023-05-24 01:13:44 +05:30
+								- #8068: In the `datahub delete` CLI, if an `--entity-type` filter is not specified, we automatically delete across all entity types. The previous behavior was to use a default entity type of dataset.
 								- #8068: In the `datahub delete` CLI, the `--start-time` and `--end-time` parameters are not required for timeseries aspect hard deletes. To recover the previous behavior of deleting all data, use `--start-time min --end-time max`.
-												fix(ingest/unity): use fully qualified catalog/schema patterns (#7900)


											
										
										
											2023-05-03 04:57:17 +05:30
 								### Potential Downtime
 								### Deprecations
-												feat(sdk): autogenerate urn types (#9257)


											
										
										
											2023-11-30 18:11:36 -05:00
-												refactor(ingest): Make get_workunits() return MetadataWorkUnits (#8051)

- Deprecates UsageAggregationClass, /usageStats?action=batchIngest, UsageStatsWorkUnit
- Removes parsing of UsageAggregationClass in file source, all sinks, and WorkUnitRecordExtractor
											
										
										
											2023-05-17 00:01:57 -04:00
+								- The signature of `Source.get_workunits()` is changed from `Iterable[WorkUnit]` to the more restrictive `Iterable[MetadataWorkUnit]`.
 								- Legacy usage creation via the `UsageAggregation` aspect, `/usageStats?action=batchIngest` GMS endpoint, and `UsageStatsWorkUnit` metadata-ingestion class are all deprecated.
-												fix(ingest/unity): use fully qualified catalog/schema patterns (#7900)


											
										
										
											2023-05-03 04:57:17 +05:30
 								### Other notable Changes
 								## 0.10.2
 								### Breaking Changes
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
+								- #7016 Add `add_database_name_to_urn` flag to Oracle source which ensure that Dataset urns have the DB name as a prefix to prevent collision (.e.g. {database}.{schema}.{table}). ONLY breaking if you set this flag to true, otherwise behavior remains the same.
 								- The Airflow plugin no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub-airflow-plugin[datahub-kafka]` for Kafka support.
 								- The Airflow lineage backend no longer includes the DataHub Kafka emitter by default. Use `pip install acryl-datahub[airflow,datahub-kafka]` for Kafka support.
-												feat(patch): patch support for flow info and job info and refactor patchbuilders for java sdk (#7495)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: David Leifker <david.leifker@acryl.io>
											
										
										
											2023-04-13 15:46:35 -05:00
+								- Java SDK PatchBuilders have been modified in a backwards incompatible way to align more with the Python SDK and support more use cases. Any application utilizing the Java SDK for patch building may be affected on upgrading this dependency.
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
-												docs(release): updating docs per release process (#7281)


											
										
										
											2023-02-09 10:41:11 -06:00
+								### Deprecations
-												fix(ingest/unity): use fully qualified catalog/schema patterns (#7900)


											
										
										
											2023-05-03 04:57:17 +05:30
+								- The docker image and script for updating from Elasticsearch 6 to 7 is no longer being maintained and will be removed from the `/contrib` section of
 								  the repository. Please refer to older releases if needed.
-												docs(release): updating docs per release process (#7281)


											
										
										
											2023-02-09 10:41:11 -06:00
 								## 0.10.0
 								### Breaking Changes
-												fix(kafka-setup): Make topic name consistent with other images (#7103)


											
										
										
											2023-01-24 18:48:23 +00:00
+								- #7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the `kafka-setup` docker image have been updated to be in-line with other DataHub components, for more info see our docs on [Configuring Kafka in DataHub
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
+								  ](https://datahubproject.io/docs/how/kafka-config). They have been suffixed with `_TOPIC` where as now the correct suffix is `_TOPIC_NAME`. This change should not affect any user who is using default Kafka names.
 								- #6906 The Redshift source has been reworked and now also includes usage capabilities. The old Redshift source was renamed to `redshift-legacy`. The `redshift-usage` source has also been renamed to `redshift-usage-legacy` will be removed in the future.
-												fix(kafka-setup): Make topic name consistent with other images (#7103)


											
										
										
											2023-01-24 18:48:23 +00:00
-												docs: add warning about clearing cookies for login (#7084)


											
										
										
											2023-01-19 20:59:35 +05:30
+								### Potential Downtime
-												docs(release notes): add helm notes for managed datahub v0.2.0 (#7311)


											
										
										
											2023-02-10 08:51:53 -06:00
+								- #6894 Search improvements requires reindexing indices. A `system-update` job will run which will set indices to read-only and create a backup/clone of each index. During the reindexing new components will be prevented from start-up until the reindex completes. The logs of this job will indicate a % complete per index. Depending on index sizes and infrastructure this process can take 5 minutes to hours however as a rough estimate 1 hour for every 2.3 million entities.
 								#### Helm Notes
 								Helm without `--atomic`: The default timeout for an upgrade command is 5 minutes. If the reindex takes longer (depending on data size) it will continue to run in the background even though helm will report a failure. Allow this job to finish and then re-run the helm upgrade command.
 								Helm with `--atomic`: In general, it is recommended to not use the `--atomic` setting for this particular upgrade since the system update job will be terminated before completion. If `--atomic` is preferred, then increase the timeout using the `--timeout` flag to account for the reindexing time (see note above for estimating this value).
-												feat(elasticsearch): Elasticsearch improvements (#6894)


											
										
										
											2023-01-31 18:44:37 -06:00
-												docs: add warning about clearing cookies for login (#7084)


											
										
										
											2023-01-19 20:59:35 +05:30
+								### Deprecations
 								## 0.9.6
 								### Breaking Changes
-												feat(ingest): avoid embedding serialized json in metadata files (#6742)


											
										
										
											2022-12-28 19:28:38 -05:00
+								- #6742 The metadata file sink's output format no longer contains nested JSON strings for MCP aspects, but instead unpacks the stringified JSON into a real JSON object. The previous sink behavior can be recovered using the `legacy_nested_json_string` option. The file source is backwards compatible and supports both formats.
-												feat(ingest): mark database_alias and env as deprecated (#6901)


											
										
										
											2023-01-09 09:28:19 -05:00
+								- #6901 The `env` and `database_alias` fields have been marked deprecated across all sources. We recommend using `platform_instance` where possible instead.
-												feat(ingest): avoid embedding serialized json in metadata files (#6742)


											
										
										
											2022-12-28 19:28:38 -05:00
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
+								### Potential Downtime
 								### Deprecations
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
 								- #6851 - Sources bigquery-legacy and bigquery-usage-legacy have been removed
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
 								### Other notable Changes
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
-												docs: add warning about clearing cookies for login (#7084)


											
										
										
											2023-01-19 20:59:35 +05:30
+								- If anyone faces issues with login please clear your cookies. Some security updates are part of this release. That may cause login issues until cookies are cleared.
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
-												chore(0.9.5): Bump defaults for release v0.9.5 (#6856)


											
										
										
											2022-12-26 02:10:26 -08:00
+								## 0.9.4 / 0.9.5
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
 								### Breaking Changes
-												feat(gms): Pluggable Authentication & Authorization Framework (#6634)


											
										
										
											2022-12-06 23:52:41 +05:30
+								- #6243 apache-ranger authorizer is no longer the core part of DataHub GMS, and it is shifted as plugin. Please refer updated documentation [Configuring Authorization with Apache Ranger](./configuring-authorization-with-apache-ranger.md#configuring-your-datahub-deployment) for configuring `apache-ranger-plugin` in DataHub GMS.
 								- #6243 apache-ranger authorizer as plugin is not supported in DataHub Kubernetes deployment.
-												feat(io): refactor metadata-io module (#8306)


											
										
										
											2023-07-19 20:09:14 -05:00
+								- #6243 Authentication and Authorization plugins configuration are removed from [application.yml](../../metadata-service/configuration/src/main/resources/application.yml). Refer documentation [Migration Of Plugins From application.yml](../plugins.md#migration-of-plugins-from-applicationyml) for migrating any existing custom plugins.
-												feat(gms): Pluggable Authentication & Authorization Framework (#6634)


											
										
										
											2022-12-06 23:52:41 +05:30
+								- `datahub check graph-consistency` command has been removed. It was a beta API that we had considered but decided there are better solutions for this. So removing this.
-												fix(ingest/powerbi-report-server): deprecate unused graphql config (#6630)


											
										
										
											2022-12-07 07:03:49 +01:00
+								- `graphql_url` option of `powerbi-report-server` source deprecated as the options is not used.
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
+								- #6789 BigQuery ingestion: If `enable_legacy_sharded_table_support` is set to False, sharded table names will be suffixed with \_yyyymmdd to make sure they don't clash with non-sharded tables. This means if stateful ingestion is enabled then old sharded tables will be recreated with a new id and attached tags/glossary terms/etc will need to be added again. _This behavior is not enabled by default yet, but will be enabled by default in a future release._
-												fix(ingest/powerbi-report-server): deprecate unused graphql config (#6630)


											
										
										
											2022-12-07 07:03:49 +01:00
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								### Potential Downtime
 								### Deprecations
-												chore(release): Updating default CLI version, update updating-datahub (#6590)


											
										
										
											2022-11-30 22:29:58 -08:00
+								### Other notable Changes
-												feat(ingest/snowflake): support filtering by fully qualified schema_pattern (#6611)


											
										
										
											2022-12-06 00:57:25 +05:30
+								- #6611 - Snowflake `schema_pattern` now accepts pattern for fully qualified schema name in format `<catalog_name>.<schema_name>` by setting config `match_fully_qualified_names : True`. Current default `match_fully_qualified_names: False` is only to maintain backward compatibility. The config option `match_fully_qualified_names` will be deprecated in future and the default behavior will assume `match_fully_qualified_names: True`."
-												feat(ingest): snowflake - update snowflake docs, add simple validations (#6636)


											
										
										
											2022-12-07 19:26:03 +05:30
+								- #6636 - Sources `snowflake-legacy` and `snowflake-usage-legacy` have been removed.
-												feat(ingest/snowflake): support filtering by fully qualified schema_pattern (#6611)


											
										
										
											2022-12-06 00:57:25 +05:30
-												chore(release): Updating default CLI version, update updating-datahub (#6590)


											
										
										
											2022-11-30 22:29:58 -08:00
+								## 0.9.3
 								### Breaking Changes
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
+								- The beta `datahub check graph-consistency` command has been removed.
-												chore(release): Updating default CLI version, update updating-datahub (#6590)


											
										
										
											2022-11-30 22:29:58 -08:00
 								### Potential Downtime
 								### Deprecations
 								- PowerBI source: `workspace_id_pattern` is introduced in place of `workspace_id`. `workspace_id` is now deprecated and set for removal in a future version.
-												feat(ingest): powerbi - scan all accessible workspaces (#6441)


											
										
										
											2022-11-28 18:17:15 +02:00
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								### Other notable Changes
 								## 0.9.2
-												fix(ingest):lookml - better column-level lineage, hive urn generation… (#6254)


											
										
										
											2022-10-20 16:39:11 -07:00
+								- LookML source will only emit views that are reachable from explores while scanning your git repo. Previous behavior can be achieved by setting `emit_reachable_views_only` to False.
 								- LookML source will always lowercase urns for lineage edges from views to upstream tables. There is no fallback provided to previous behavior because it was inconsistent in application of lower-casing earlier.
-												fix(ingest): dbt - lowercase external urns + cleanup config (#6289)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2022-10-26 19:46:51 -07:00
+								- dbt config `node_type_pattern` which was previously deprecated has been removed. Use `entities_enabled` instead to control whether to emit metadata for sources, models, seeds, tests, etc.
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- The dbt source will always lowercase urns for lineage edges to the underlying data platform.
-												feat(ingest): drop plugin support for airflow 1.x (#6331)


											
										
										
											2022-11-01 21:12:34 -07:00
+								- The DataHub Airflow lineage backend and plugin no longer support Airflow 1.x. You can still run DataHub ingestion in Airflow 1.x using the [PythonVirtualenvOperator](https://airflow.apache.org/docs/apache-airflow/1.10.15/_api/airflow/operators/python_operator/index.html?highlight=pythonvirtualenvoperator#airflow.operators.python_operator.PythonVirtualenvOperator).
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
-												feat(release): bump CLI version to 0.9.0 (#6195)

* feat(release): bump CLI version to 0.9.0

* update breaking changes section

* add graphql breaking change

* typo
											
										
										
											2022-10-14 17:14:26 +02:00
+								### Breaking Changes
-												feat(ingest): add timestamps for snowflake objects (#6570)

Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
											
										
										
											2022-12-08 04:41:08 +05:30
+								- #6570 `snowflake` connector now populates created and last modified timestamps for snowflake datasets and containers. This version of snowflake connector will not work with **datahub-gms** version older than `v0.9.3`
-												feat(release): bump CLI version to 0.9.0 (#6195)

* feat(release): bump CLI version to 0.9.0

* update breaking changes section

* add graphql breaking change

* typo
											
										
										
											2022-10-14 17:14:26 +02:00
+								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
-												fix(docs): add missing docs for 0.9.1 (#6515)


											
										
										
											2022-11-22 20:41:29 +05:30
+								## 0.9.1
 								### Breaking Changes
-												feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
											
										
										
											2022-12-21 05:29:46 +01:00
 								- We have promoted `bigquery-beta` to `bigquery`. If you are using `bigquery-beta` then change your recipes to use the type `bigquery`.
-												fix(docs): add missing docs for 0.9.1 (#6515)


											
										
										
											2022-11-22 20:41:29 +05:30
 								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
-												feat(release): bump CLI version to 0.9.0 (#6195)

* feat(release): bump CLI version to 0.9.0

* update breaking changes section

* add graphql breaking change

* typo
											
										
										
											2022-10-14 17:14:26 +02:00
+								## 0.9.0
-												chore(0.8.42): update breaking changes doc (#5563)


											
										
										
											2022-08-04 03:42:55 -07:00
+								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												refactor(java11) - convert most modules to java 11 (#5836)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2022-09-25 10:39:22 -05:00
+								- Java version 11 or greater is required.
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- For any of the GraphQL search queries, the input no longer supports value but instead now accepts a list of values. These values represent an OR relationship where the field value must match any of the values.
-												chore(0.8.42): update breaking changes doc (#5563)


											
										
										
											2022-08-04 03:42:55 -07:00
-												docs(): Adding disclaimers to updating datahub (#5998)


											
										
										
											2022-09-20 10:18:38 -07:00
+								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
-												feat(roles): add ability to invite users into a role (#6015)


											
										
										
											2022-09-23 16:48:23 -07:00
+								## `v0.8.45`
 								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
 								- The `getNativeUserInviteToken` and `createNativeUserInviteToken` GraphQL endpoints have been renamed to
 								  `getInviteToken` and `createInviteToken` respectively. Additionally, both now accept an optional `roleUrn` parameter.
-												feat(roles): add ability to invite users into a role (#6015)


											
										
										
											2022-09-23 16:48:23 -07:00
+								  Both endpoints also now require the `MANAGE_POLICIES` privilege to execute, rather than `MANAGE_USER_CREDENTIALS`
 								  privilege.
 								- One of the default policies shipped with DataHub (`urn:li:dataHubPolicy:7`, or `All Users - All Platform Privileges`)
 								  has been edited to no longer include `MANAGE_POLICIES`. Its name has consequently been changed to
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								  `All Users - All Platform Privileges (EXCEPT MANAGE POLICIES)`. This change was made to prevent all users from
-												feat(roles): add ability to invite users into a role (#6015)


											
										
										
											2022-09-23 16:48:23 -07:00
+								  effectively acting as superusers by default.
 								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
-												docs(): Adding disclaimers to updating datahub (#5998)


											
										
										
											2022-09-20 10:18:38 -07:00
+								## `v0.8.44`
 								### Breaking Changes
 								- Browse Paths have been upgraded to a new format to align more closely with the intention of the feature.
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
+								  Learn more about the changes, including steps on upgrading, here: <https://datahubproject.io/docs/advanced/browse-paths-upgrade>
-												fix(ingest): remove dbt `disable_dbt_node_creation` and `load_schema` options (#5877)


											
										
										
											2022-09-09 14:07:55 -07:00
+								- The dbt ingestion source's `disable_dbt_node_creation` and `load_schema` options have been removed. They were no longer necessary due to the recently added sibling entities functionality.
-												refactor(snowflake): move snowflake-beta to certified snowflake source (#5923)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
											
										
										
											2022-09-15 22:23:54 +05:30
+								- The `snowflake` source now uses newer faster implementation (earlier `snowflake-beta`). Config properties `provision_role` and `check_role_grants` are not supported. Older `snowflake` and `snowflake-usage` are available as `snowflake-legacy` and `snowflake-usage-legacy` sources respectively.
-												fix(browse): Fixing browse path to remove requirement for simple name suffix (#5634)


											
										
										
											2022-09-07 13:32:38 -07:00
-												chore(0.8.42): update breaking changes doc (#5563)


											
										
										
											2022-08-04 03:42:55 -07:00
+								### Potential Downtime
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- [Helm] If you're using Helm, please ensure that your version of the `datahub-actions` container is bumped to `v0.0.7` or `head`.
 								  This version contains changes to support running ingestion in debug mode. Previous versions are not compatible with this release.
 								  Upgrading to helm chart version `0.2.103` will ensure that you have the compatible versions by default.
-												docs(): Adding disclaimers to updating datahub (#5998)


											
										
										
											2022-09-20 10:18:38 -07:00
-												chore(0.8.42): update breaking changes doc (#5563)


											
										
										
											2022-08-04 03:42:55 -07:00
+								### Deprecations
 								### Other notable Changes
 								## `v0.8.42`
-												fix(docs,quickstart): release related changes for 0.8.40 (#5299)


											
										
										
											2022-06-30 17:21:12 +05:30
+								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												chore(ingest): drop python 3.6 support (#5521)


											
										
										
											2022-08-10 22:00:31 +00:00
+								- Python 3.6 is no longer supported for metadata ingestion
-												feat(cli): delete - hard delete deletes soft deleted entities (#5478)


											
										
										
											2022-07-26 06:47:02 +05:30
+								- #5451 `GMS_HOST` and `GMS_PORT` environment variables deprecated in `v0.8.39` have been removed. Use `DATAHUB_GMS_HOST` and `DATAHUB_GMS_PORT` instead.
 								- #5478 DataHub CLI `delete` command when used with `--hard` option will delete soft-deleted entities which match the other filters given.
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- #5471 Looker now populates `userEmail` in dashboard user usage stats. This version of looker connnector will not work with older version of **datahub-gms** if you have `extract_usage_history` looker config enabled.
-												fix(analytics-tab) - fix analytics tab config variable for gms (#5529)


											
										
										
											2022-08-02 14:35:04 +03:00
+								- #5529 - `ANALYTICS_ENABLED` environment variable in **datahub-gms** is now deprecated. Use `DATAHUB_ANALYTICS_ENABLED` instead.
-												docs(delete): cleanup removed option (#7335)


											
										
										
											2023-02-14 21:55:21 +05:30
+								- #5485 `--include-removed` option was removed from delete CLI
-												feat(quickstart,docs): updates for v0.8.41 (#5409)


											
										
										
											2022-07-15 21:32:32 +05:30
 								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
 								## `v0.8.41`
 								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												feat(ingest): update CSV source to support description and ownership type (#5346)


											
										
										
											2022-07-06 08:59:29 -07:00
+								- The `should_overwrite` flag in `csv-enricher` has been replaced with `write_semantics` to match the format used for other sources. See the [documentation](https://datahubproject.io/docs/generated/ingestion/sources/csv/) for more details
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- Closing an authorization hole in creating tags adding a Platform Privilege called `Create Tags` for creating tags. This is assigned to `datahub` root user, along
 								  with default All Users policy. Notice: You may need to add this privilege (or `Manage Tags`) to existing users that need the ability to create tags on the platform.
-												docs(bigquery): add changelog and unittest for profiling limits (#5407)


											
										
										
											2022-07-19 09:39:09 +05:30
+								- #5329 Below profiling config parameters are now supported in `BigQuery`:
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs(bigquery): add changelog and unittest for profiling limits (#5407)


											
										
										
											2022-07-19 09:39:09 +05:30
+								  - profiling.profile_if_updated_since_days (default=1)
 								  - profiling.profile_table_size_limit (default=1GB)
 								  - profiling.profile_table_row_limit (default=50000)
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs(bigquery): add changelog and unittest for profiling limits (#5407)


											
										
										
											2022-07-19 09:39:09 +05:30
+								  Set above parameters to `null` if you want older behaviour.
-												fix(docs,quickstart): release related changes for 0.8.40 (#5299)


											
										
										
											2022-06-30 17:21:12 +05:30
+								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
 								## `v0.8.40`
-												feat: updates for 0.8.34 (#4829)


											
										
										
											2022-05-05 16:27:06 +05:30
+								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												feat(ingest): working with multiple bigquery projects (#5240)


											
										
										
											2022-06-27 14:21:54 +05:30
+								- #5240 `lineage_client_project_id` in `bigquery` source is removed. Use `storage_project_id` instead.
-												feat: updates for 0.8.34 (#4829)


											
										
										
											2022-05-05 16:27:06 +05:30
-												feat(ingest): working with multiple bigquery projects (#5240)


											
										
										
											2022-06-27 14:21:54 +05:30
+								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
 								## `v0.8.39`
 								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												refactor(UI): Refactor Dataset Health Status (#5222)


											
										
										
											2022-06-22 15:21:34 -04:00
+								- Refactored the `health` field of the `Dataset` GraphQL Type to be of type **list of HealthStatus** (was type **HealthStatus**). See [this PR](https://github.com/datahub-project/datahub/pull/5222/files) for more details.
-												feat: updates for 0.8.34 (#4829)


											
										
										
											2022-05-05 16:27:06 +05:30
+								### Potential Downtime
-												feat: updates for 0.8.35 (#4960)


											
										
										
											2022-05-19 12:25:56 -05:00
+								### Deprecations
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs(site redesign): Overhaul Docs Site (#5731)

* adding new wip docs

* update to docs-website

* update to get started link

* lint cleanup

* lint cleanup
											
										
										
											2022-08-26 16:29:01 -05:00
+								- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
-												feat(ingest): remove need for sink block in UI based ingestion (#5208)


											
										
										
											2022-06-23 15:22:40 +05:30
+								- #5208 `GMS_HOST` and `GMS_PORT` environment variables being set in various containers are deprecated in favour of `DATAHUB_GMS_HOST` and `DATAHUB_GMS_PORT`.
-												refactor(configs): Simplify Kafka Topic name configurations + docs (#5198)


											
										
										
											2022-06-17 15:15:51 -04:00
+								- `KAFKA_TOPIC_NAME` environment variable in **datahub-mae-consumer** and **datahub-gms** is now deprecated. Use `METADATA_AUDIT_EVENT_NAME` instead.
 								- `KAFKA_MCE_TOPIC_NAME` environment variable in **datahub-mce-consumer** and **datahub-gms** is now deprecated. Use `METADATA_CHANGE_EVENT_NAME` instead.
 								- `KAFKA_FMCE_TOPIC_NAME` environment variable in **datahub-mce-consumer** and **datahub-gms** is now deprecated. Use `FAILED_METADATA_CHANGE_EVENT_NAME` instead.
-												fix(docs): Update docs to alert users to restore indices for their Glossary (#5082)


											
										
										
											2022-06-03 15:52:51 -04:00
+								### Other notable Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												feat(ingest): snowflake profile tables only if they have been updates since N days (#5132)


											
										
										
											2022-06-13 14:59:16 +05:30
+								- #5132 Profile tables in `snowflake` source only if they have been updated since configured (default: `1`) number of day(s). Update the config `profiling.profile_if_updated_since_days` as per your profiling schedule or set it to `None` if you want older behaviour.
-												fix(docs): Update docs to alert users to restore indices for their Glossary (#5082)


											
										
										
											2022-06-03 15:52:51 -04:00
-												Chore: Bump Default UI Ingestion Version (#5145)


											
										
										
											2022-06-09 23:35:24 -04:00
+								## `v0.8.38`
 								### Breaking Changes
 								### Potential Downtime
 								### Deprecations
 								### Other notable Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												Chore: Bump Default UI Ingestion Version (#5145)


											
										
										
											2022-06-09 23:35:24 -04:00
+								- Create & Revoke Access Tokens via the UI
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- Create and Manage new users via the UI
-												Chore: Bump Default UI Ingestion Version (#5145)


											
										
										
											2022-06-09 23:35:24 -04:00
+								- Improvements to Business Glossary UI
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
+								- FIX - Do not require reindexing to migrate to using the UI business glossary
-												Chore: Bump Default UI Ingestion Version (#5145)


											
										
										
											2022-06-09 23:35:24 -04:00
-												fix(docs): Update docs to alert users to restore indices for their Glossary (#5082)


											
										
										
											2022-06-03 15:52:51 -04:00
+								## `v0.8.36`
 								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												fix(docs): Update docs to alert users to restore indices for their Glossary (#5082)


											
										
										
											2022-06-03 15:52:51 -04:00
+								- In this release we introduce a brand new Business Glossary experience. With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to [restore your indices](https://datahubproject.io/docs/how/restore-indices/) in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
 								### Potential Downtime
 								### Deprecations
-												feat: updates for 0.8.35 (#4960)


											
										
										
											2022-05-19 12:25:56 -05:00
+								### Other notable Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												feat(bigquery): reduce logging (#4961)

* feat(bigquery): reduce logging

* doc: add entry for behaviour change
											
										
										
											2022-05-20 22:12:55 +05:30
+								- #4961 Dropped profiling is not reported by default as that caused a lot of spurious logging in some cases. Set `profiling.report_dropped_profiles` to `True` if you want older behaviour.
-												feat: updates for 0.8.35 (#4960)


											
										
										
											2022-05-19 12:25:56 -05:00
 								## `v0.8.35`
 								### Breaking Changes
 								### Potential Downtime
-												feat: updates for 0.8.34 (#4829)


											
										
										
											2022-05-05 16:27:06 +05:30
+								### Deprecations
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
 								- #4875 Lookml view file contents will no longer be populated in custom_properties, instead view definitions will be always available in the View Definitions tab.
-												feat: updates for 0.8.34 (#4829)


											
										
										
											2022-05-05 16:27:06 +05:30
 								### Other notable Changes
 								## `v0.8.34`
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
+								### Breaking Changes
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs: add missing PR numbers (#4742)

* docs: add missing PR numbers & specific version where deprecation was done
											
										
										
											2022-04-26 22:48:24 +05:30
+								- #4644 Remove `database` option from `snowflake` source which was deprecated since `v0.8.5`
 								- #4595 Rename confusing config `report_upstream_lineage` to `upstream_lineage_in_report` in `snowflake` connector which was added in `0.8.32`
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
 								### Potential Downtime
 								### Deprecations
-												feat(ingest): dbt cloud integration (#6323)


											
										
										
											2022-11-21 14:14:33 -05:00
-												docs: add missing PR numbers (#4742)

* docs: add missing PR numbers & specific version where deprecation was done
											
										
										
											2022-04-26 22:48:24 +05:30
+								- #4644 `host_port` option of `snowflake` and `snowflake-usage` sources deprecated as the name was confusing. Use `account_id` option instead.
-												doc: add page for handling deprecations, breaking changes etc. (#4590)


											
										
										
											2022-04-07 22:18:47 +05:30
 								### Other notable Changes
-												feat(ingest): redshift - Redshift rework (#6906)


											
										
										
											2023-04-12 19:15:43 +02:00
-												fix(ingest): lookml - add view definitions for all views (#4875)


											
										
										
											2022-05-10 10:48:36 -07:00
+								- #4760 `check_role_grants` option was added in `snowflake` to disable checking roles in `snowflake` as some people were reporting long run times when checking roles.