1.**SystemUpdate**: Performs any tasks required to update to a new version of DataHub. For example, applying new configurations to the search & graph indexes, ingesting default settings, and more. Once completed, emits a message to the DataHub Upgrade History Kafka (`DataHubUpgradeHistory_v1`) topic, which signals to other pods that DataHub is ready to start.
Note that this _must_ be executed any time the DataHub version is incremented before starting or restarting other system containers. Dependent services will wait until the Kafka message is emitted corresponding to the code they are running.
A unique "version id" is generated based on a combination of the a) embedded git tag corresponding to the version of DataHub running and b) an optional revision number, provided via the `DATAHUB_REVISION` environment variable. Helm uses
the latter to ensure that the system upgrade job is executed every single time a deployment of DataHub is performed, even if the container version has not changed.
Important: This job runs as a pre-install hook via the DataHub Helm Charts, i.e. before deploying new version tags for each container.
2.**SystemUpdateBlocking**: Performs any _blocking_ tasks required to update to a new version of DataHub, as a subset of **SystemUpdate**.
3.**SystemUpdateNonBlocking**: Performs any _nonblocking_ tasks required to update to a new version of DataHub, as a subset of **SystemUpdate**.
4.**RestoreIndices**: Restores indices by fetching the latest version of each aspect and restating MetadataChangeLog events for each latest aspect. Arguments include:
5.**RestoreBackup**: Restores the primary storage - the SQL document DB - from an available backup of the local database. Requires that the backup reader and backup are provided. Note that this does not also restore the secondary indexes, the graph or search storage. To do so, you should run the **RestoreIndices** upgrade job.
Arguments include:
- _BACKUP_READER_ (Required): The backup reader to use to read and restore the db. The only backup reader currently supported is `LOCAL_PARQUET`, which requires a parquet-formatted backup file path to be specified via the `BACKUP_FILE_PATH` argument.
- _BACKUP_FILE_PATH_ (Required): The path of the backup file. If you are running in a container, this needs to the location where the backup file has been mounted into the container.
6.**EvaluateTests**: Executes all Metadata Tests in batches. Running this job can slow down DataHub, and it in some cases requires full scans of the document db. Generally, it's recommended to configure this to run one time per day (which is the helm CronJob default).
Arguments include:
- _batchSize_ (Optional): The number of assets to test at a time. Defaults to 1000.
- _batchDelayMs_ (Optional): The number of milliseconds of delay between evaluated asset batches. Used for rate limiting. Defaults to 250.
7. (Legacy) **NoCodeDataMigration**: Performs a series of pre-flight qualification checks and then migrates metadata\*aspect table data
to metadata_aspect_v2 table. Arguments include:
- _batchSize_ (Optional): The number of rows to migrate at a time. Defaults to 1000.
- _batchDelayMs_ (Optional): The number of milliseconds of delay between migrated batches. Used for rate limiting. Defaults to 250.
- _dbType_ (Optional): The target DB type. Valid values are `MYSQL`, `MARIA`, `POSTGRES`. Defaults to `MYSQL`.
If you are using newer versions of DataHub (v1.0.0 or above), this upgrade job will not be relevant.
8. (Legacy) **NoCodeDataMigrationCleanup**: Cleanses graph index, search index, and key-value store of legacy DataHub data (metadata_aspect table) once
the No Code Data Migration has completed successfully. No arguments.
If you are using newer versions of DataHub (v1.0.0 or above), this upgrade job will not be relevant.
docker pull acryldata/datahub-upgrade:head && docker run --env-file env/docker.env acryldata/datahub-upgrade:head -u NoCodeDataMigration -a batchSize=500 -a batchDelayMs=1000