8.5 KiB
title | slug |
---|---|
Upgrade 0.11 to 0.12 | /deployment/upgrade/versions/011-to-012 |
Upgrade from 0.11 to 0.12
Upgrading from 0.11 to 0.12 can be done directly on your instances. This page will list few general details you should take into consideration when running the upgrade.
Highlights
Database Connection Environment Variables
On 0.11, the Environment Variables to connect to Database used were
- MYSQL_USER
- MYSQL_USER_PASSWORD
- MYSQL_HOST
- MYSQL_PORT
- MYSQL_DATABASE.
These environment variables are changed in 0.12.0 Release
- DB_USER
- DB_USER_PASSWORD
- DB_HOST
- DB_PORT
- OM_DATABASE.
This will effect to all the bare metal and docker instances which configures a custom database depending on the above environment variable values.
This change is however not affected for Kubernetes deployments.
Data Profiler and Data Quality Tests
On 0.11, the Profiler Workflow handled two things:
- Computing metrics on the data
- Running the configured Data Quality Tests
There has been a major overhaul where not only the UI greatly improved, now showing all historical data, but on the internals as well. Main topics to consider:
- Tests now run with the Test Suite workflow and cannot be configured in the Profiler Workflow
- Any past test data will be cleaned up during the upgrade to 0.12.0, as the internal data storage has been improved
- The Profiler Ingestion Pipelines will be cleaned up during the upgrade to 0.12.0 as well
- You will see broken profiler DAGs in airflow -- you can simply delete these DAGs
dbt Tests Integration
From 0.12, OpenMetadata supports ingesting the tests and the test results from your dbt project.
- Along with
manifest.json
andcatalog.json
files, We now provide an option to ingest therun_results.json
file generated from the dbt run and ingest the test results from it. - The field to enter the
run_results.json
file path is an optional field in the case of local and http dbt configs. The test results will be ingested from this file if it is provided. - For others the file will be picked from their respective sources if it is available.
Profiler Workflow Updates
On top of the information above, the fqnFilterPattern
has been converted into the same patterns we use for ingestion,
databaseFilterPattern
, schemaFilterPattern
and tableFilterPattern
.
In the processor
you can now configure:
profileSample
to specify the % of the table to run the profiling oncolumnConfig.profileQuery
as a query to use to sample the data of the tablecolumnConfig.excludeColumns
andcolumnConfig.includeColumns
to mark which columns to skip.- In
columnConfig.includeColumns
we can also specify a list ofmetrics
to run from our supported metrics.
Profiler Multithreading for Snowflake users
In OpenMetadata 0.12 we have migrated the metrics computation to multithreading. This migration reduced metrics computation time by 70%.
Snowflake users may experience a circular import error. This is a known issue with snowflake-connector-python
. If you experience such error we recommend to either 1) run the ingestion workflow in Python 3.8 environment or 2) if you can't manage your environement set ThreadCount
to 1. You can find more information on the profiler setting here
Airflow Version
The Airflow version from the Ingestion container image has been upgraded to 2.3.3
.
Note that this means that now this is the version that will be used to run the Airflow metadata extraction. This impacted for example when ingesting status from Airflow 2.1.4 (issue[https://github.com/open-metadata/OpenMetadata/issues/7228]).
Moreover, the authentication mechanism that Airflow exposes for the custom plugins has changed. This required us to fully update how we were handling the managed APIs, both on the plugin side and the OpenMetadata API (which is the one sending the authentication).
To continue working with your own Airflow linked to the OpenMetadata UI for ingestion management, we recommend migrating to Airflow 2.3.3.
If you are using your own Airflow to prepare the ingestion from the UI, which is stuck in version 2.1.4, and you cannot upgrade that, but you want to use OM 0.12, reach out to us.
Note
Upgrading airflow from 2.1.4 to 2.3 requires a few steps. If you are using your airflow instance only to run OpenMetadata workflow we recommend you to simply drop the airflow database. You can simply connect to your database engine where your airflow database exist, make a back of your database, drop it and recreate it.
DROP DATABASE airflow_db;
CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'@'%' WITH GRANT OPTION;
If you would like to keep the existing data in your airflow instance or simply cannot drop your airflow database you will need to follow the steps below:
- Make a backup of your database (this will come handing if the migration fails and you need to perform any kind of restore)
- Upgrade your airflow instance from 2.1.x to 2.2.x
- Before upgrading to the 2.3.x version perform the steps describe on the airflow documentation page [here] to make sure character set/collation uses the correct encoding -- this has been changing across MySQL versions.
- Once the above has been performed you should be able to upgrade to airflow 2.3.x
Connector Improvements
-
Oracle: In
0.11.x
and previous releases, we were using the Cx_Oracle driver to extract the metadata from oracledb. The drawback of using this driver was it required Oracle Client libraries to be installed in the host machine in order to run the ingestion. With the0.12
release, we will be using the python-oracledb driver which is a upgraded version ofCx_Oracle
.python-oracledb
withThin
mode does not need Oracle Client libraries. -
Azure SQL & MSSQL: Azure SQL & MSSQL with pyodbc scheme requires ODBC driver to be installed, with
0.12
release we are shipping theODBC Driver 18 for SQL Server
out of the box in our ingestion docker image.
Service Connection Updates
- DynamoDB
- Removed:
database
- Removed:
- Deltalake:
- Removed:
connectionOptions
andsupportsProfiler
- Removed:
- Looker
- Renamed
username
toclientId
andpassword
toclientSecret
to align on the internals required for the metadata extraction. - Removed:
env
- Renamed
- Oracle
- Removed:
databaseSchema
andoracleServiceName
from the root. - Added:
oracleConnectionType
which will either containoracleServiceName
ordatabaseSchema
. This will reduce confusion on setting up the connection.
- Removed:
- Athena
- Removed:
hostPort
- Removed:
- Databricks
- Removed:
username
andpassword
- Removed:
- dbt Config
- Added:
dbtRunResultsFilePath
anddbtRunResultsHttpPath
where path of therun_results.json
file can be passed to get the test results data from dbt.
- Added:
In 0.12.1
- DeltaLake:
- Updated the structure of the connection to better reflect the possible options.
- Removed
metastoreHostPort
andmetastoreFilePath
, which are now embedded insidemetastoreConnection
- Added
metastoreDb
as an option to be passed insidemetastoreConnection
- Removed
- You can find more information about the current structure here
- Updated the structure of the connection to better reflect the possible options.
Ingestion from CLI
We have stopped updating the service connection parameters when running the ingestion workflow from the CLI. The connection parameter will be retrieved from the server if the service already exists. Therefore, the connection parameters of a service will only be possible to be updated from the OpenMetadata UI.
Bots configuration
In the 0.12.1 version, AIRFLOW_AUTH_PROVIDER
and OM_AUTH_AIRFLOW_{AUTH_PROVIDER}
parameters are not needed to configure
how the ingestion is performed from Airflow when our OpenMetadata server is secured. This can be achieved directly from UI
through the Bots configuration in the settings page. For more information, visit the section of each SSO configuration in
the Enable Security
chapter.
Note that the ingestion-bot
bot is created (or updated if it already exists) as a system bot that cannot be deleted, and
the credentials used for this bot, if they did not exist before, will be the ones present in the OpenMetadata configuration.
Otherwise, a JWT Token will be generated to be the default authentication mechanism of the ingestion-bot
.