9.9 KiB

title slug collate
Upgrade 1.0 to 1.1 /deployment/upgrade/versions/100-to-110 false

Upgrade from 1.0 to 1.1

Upgrading from 1.0 to 1.1 can be done directly on your instances. This page will list few general details you should take into consideration when running the upgrade.

Deprecation Notice

  • The 1.1 Release will be the last one with support for Python 3.7 since it is already EOL. OpenMetadata 1.2 will support Python version 3.8 to 3.10.
  • In 1.2 we will completely remove the Bots configured with SSO. Only JWT will be available then. Please, upgrade your bots if you haven't done so. Note that the UI already does not allow creating bots with SSO.
  • 1.1 is the last release that will allow ingesting Impala from the Hive connector. In the next release we will only support the Impala scheme from the Impala Connector.

Breaking Changes for 1.1 Stable Release

OpenMetadata Helm Chart Values

With 1.1.0 we are moving away from global.* helm values under openmetadata helm charts to openmetadata.config.*. This change is introduce as helm reserves global chart values across all the helm charts. This conflicted the use of OpenMetadata helm charts along with other helm charts for organizations using common helm values yaml files.

For example, with 1.0.X Application version Releases, helm values would look like below -

global:
  ...
  authorizer:
    className: "org.openmetadata.service.security.DefaultAuthorizer"
    containerRequestFilter: "org.openmetadata.service.security.JwtFilter"
    initialAdmins:
      - "user1"
    principalDomain: "open-metadata.org"
  authentication:
    provider: "google"
    publicKeys:
      - "https://www.googleapis.com/oauth2/v3/certs"
      - "http://openmetadata:8585/api/v1/system/config/jwks"
    authority: "https://accounts.google.com"
    clientId: "{client id}"
    callbackUrl: "http://localhost:8585/callback"
  ...

With OpenMetadata Application version 1.1.0 and above, the above config will need to be updated as

openmetadata:
  config:
    authorizer:
      className: "org.openmetadata.service.security.DefaultAuthorizer"
      containerRequestFilter: "org.openmetadata.service.security.JwtFilter"
      initialAdmins:
        - "user1"
      principalDomain: "open-metadata.org"
    authentication:
      provider: "google"
      publicKeys:
        - "https://www.googleapis.com/oauth2/v3/certs"
        - "http://openmetadata:8585/api/v1/system/config/jwks"
      authority: "https://accounts.google.com"
      clientId: "{client id}"
      callbackUrl: "http://localhost:8585/callback"

A quick and easy way to update the config is to use yq utility to manipulate YAML files.

yq -i -e '{"openmetadata": {"config": .global}}' openmetadata.values.yml

The above command will update global.* with openmetadata.config.* yaml config. Please note, the above command is only recommended for users with custom helm values file explicit for OpenMetadata Helm Charts.

For more information, visit the official helm docs for global chart values.

Elasticsearch and OpenSearch

We now support ES version up to 7.16. However, this means that we need to handle the internals a bit differently for Elasticsearch and OpenSearch. In the server configuration, we added the following key:

elasticsearch:
  searchType: ${SEARCH_TYPE:- "elasticsearch"} # or opensearch

If you use Elasticsearch there's nothing to do. However, if you use OpenSearch, you will need to pass the new parameter as opensearch.

Pipeline Service Client Configuration

If reusing an old YAML configuration file, make sure to add the following inside pipelineServiceClientConfiguration:

pipelineServiceClientConfiguration:
  # ...
  # Secrets Manager Loader: specify to the Ingestion Framework how to load the SM credentials from its env
  # Supported: noop, airflow, env
  secretsManagerLoader: ${PIPELINE_SERVICE_CLIENT_SECRETS_MANAGER_LOADER:-"noop"}
  healthCheckInterval: ${PIPELINE_SERVICE_CLIENT_HEALTH_CHECK_INTERVAL:-300}

Secrets Manager YAML config

If you are using the Secrets Manager and running ingestions via the CLI or Airflow, your workflow config looked as follows:

workflowConfig:
  openMetadataServerConfig:
    secretsManagerProvider: <Provider>
    secretsManagerCredentials:
      awsAccessKeyId: <aws access key id>
      awsSecretAccessKey: <aws secret access key>
      awsRegion: <aws region>
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>

We are removing the secretsManagerCredentials key as a whole, so instead you'll need to configure:

workflowConfig:
  openMetadataServerConfig:
    secretsManagerProvider: aws
    secretsManagerLoader: airflow  # if running on Airflow, otherwise `env`
    hostPort: <OpenMetadata host and port>
    authProvider: <OpenMetadata auth provider>

You can find further details on this configuration here.

Service Connection Changes

MySQL and Postgres Connection

Adding IAM role support for their auth requires a slight change on their JSON Schemas:

From

...
  serviceConnection:
    config: Mysql # or Postgres
    password: Password

To

If we want to use the basic authentication:

...
  serviceConnection:
    config: Mysql # or Postgres
    authType:
      password: Password

Or if we want to use the IAM auth:

...
  serviceConnection:
    config: Mysql # or Postgres
    authType:
      awsConfig:
        awsAccessKeyId: ...
        wsSecretAccessKey: ...
        awsRegion: ...

Looker Connection

Now support GitHub and BitBucket as repositories for LookML models.

From

...
  serviceConnection:
    config:
      type: Looker
      clientId: ...
      clientSecret: ...
      hostPort: ...
      githubCredentials:
        repositoryOwner: ...
        repositoryName: ...
        token: ...

To

...
  serviceConnection:
    config:
      type: Looker
      clientId: ...
      clientSecret: ...
      hostPort: ...
      gitCredentials:
        type: GitHub # or BitBucket
        repositoryOwner: ...
        repositoryName: ...
        token: ...

From GCS to GCP

We are renaming the gcsConfig to gcpConfig to properly define their role as generic Google Cloud configurations. This impacts BigQuery, Datalake and any other source where you are directly passing the GCP credentials to connect to.

From

...
  credentials:
    gcsConfig:
...

To

...
  credentials:
    gcpConfig:
...

Data Quality

From

source:
  type: TestSuite
  serviceName: MyAwesomeTestSuite
  sourceConfig:
    config:
      type: TestSuite
    
processor:
  type: "orm-test-runner"
  config:
    testSuites:
      - name: test_suite_one
        description: this is a test testSuite to confirm test suite workflow works as expected
        testCases:
          - name: a_column_test
            description: A test case
            testDefinitionName: columnValuesToBeBetween
            entityLink: "<#E::table::local_redshift.dev.dbt_jaffle.customers::columns::number_of_orders>"     
            parameterValues:
              - name: minValue
                value: 2
              - name: maxValue
                value: 20

To

source:
  type: TestSuite
  serviceName: <your_service_name>
  sourceConfig:
    config:
      type: TestSuite
      entityFullyQualifiedName: <entityFqn>

processor:
  type: "orm-test-runner"
  config:
    forceUpdate: <false|true>
    testCases:
      - name: <testCaseName>
        testDefinitionName: columnValueLengthsToBeBetween
        columnName: <columnName>
        parameterValues:
          - name: minLength
            value: 10
          - name: maxLength
            value: 25
      - name: <testCaseName>
        testDefinitionName: tableRowCountToEqual
        parameterValues:
          - name: value
            value: 10

Entity Changes

  • Pipeline Entity: pipelineUrl and taskUrl fields of pipeline entity has now been renamed to sourceUrl.
  • Chart Entity: chartUrl field of chart entity has now been renamed to sourceUrl.
  • Dashboard Entity: dashboardUrl field of dashboard entity has now been renamed to sourceUrl.
  • Table Entity: sourceUrl field has been added to table entity which will refer to the url of data source portal (if exists). For instance, in the case of BigQuery, the sourceUrl field will store the URL to table details page in GCP BigQuery portal.

Other changes

  • Glue now supports custom database names via databaseName.
  • Snowflake supports the clientSessionKeepAlive parameter to keep the session open for long processes.
  • Databricks now supports the useUnityCatalog parameter to extract the metadata from unity catalog instead of hive metastore.
  • Kafka and Redpanda now have the saslMechanism based on enum values ["PLAIN", "GSSAPI", "SCRAM-SHA-256", "SCRAM-SHA-512", "OAUTHBEARER"].
  • OpenMetadata Server Docker Image now installs the OpenMetadata Libraries under /opt/openmetadata directory
  • Bumped up ElasticSearch version for Docker and Kubernetes OpenMetadata Dependencies Helm Chart to 7.16.3

Data Quality Migration

With 1.1.0 version we are migrating existing test cases defined in a test suite to the corresponding table, with this change you might need to recreate the pipelines for the test suites, since due to this restructuring the existing ones are removed from Test Suites - more details about the new data quality can be found here.

As a user you will need to redeploy data quality workflows. You can go to Quality > By Tables to view the tables with test cases that need a workflow to be set up.