2024-06-18 15:53:06 +02:00
---
title: Run the Iceberg Connector Externally
slug: /connectors/database/iceberg/yaml
---
{% connectorDetailsHeader
name="Iceberg"
stage="BETA"
platform="OpenMetadata"
availableFeatures=["Metadata", "Owners"]
2025-03-03 12:56:25 +05:30
unavailableFeatures=["Query Usage", "Data Profiler", "Data Quality", "Lineage", "Column-level Lineage", "dbt", "Tags", "Stored Procedures", "Sample Data"]
2024-06-18 15:53:06 +02:00
/ %}
In this section, we provide guides and references to use the Iceberg connector.
2025-05-22 14:40:05 +05:30
Configure and schedule Iceberg metadata from the OpenMetadata UI:
2024-06-18 15:53:06 +02:00
- [Requirements ](#requirements )
- [Metadata Ingestion ](#metadata-ingestion )
2024-08-07 11:41:00 +05:30
- [Enable Security ](#securing-rest-catalog-connection-with-ssl-in-openmetadata )
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/external-ingestion-deployment.md" /%}
2024-06-18 15:53:06 +02:00
## Requirements
The requirements actually depend on the Catalog and the FileSystem used. In a nutshell, the used credentials must have access to reading the Catalog and the Metadata File.
### Glue Catalog
Must have `glue:GetDatabases` , and `glue:GetTables` permissions to be able to read the Catalog.
Must also have the `s3:GetObject` permission for the location of the Iceberg tables.
### DynamoDB Catalog
Must have `dynamodb:DescribeTable` and `dynamodb:GetItem` permissions on the Iceberg Catalog table.
Must also have the `s3:GetObject` permission for the location of the Iceberg tables.
### Hive / REST Catalog
It depends on where and how the Hive / Rest Catalog is setup and where the Iceberg files are stored.
### Python Requirements
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/python-requirements.md" /%}
2024-06-18 15:53:06 +02:00
To run the Iceberg ingestion, you will need to install:
```bash
pip3 install "openmetadata-ingestion[iceberg]"
```
## Metadata Ingestion
All connectors are defined as JSON Schemas.
[Here ](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/icebergConnection.json )
2025-05-22 14:40:05 +05:30
you can find the structure to create a connection to Iceberg.
2024-06-18 15:53:06 +02:00
In order to create and run a Metadata Ingestion workflow, we will follow
the steps to create a YAML configuration able to connect to the source,
process the Entities if needed, and reach the OpenMetadata server.
The workflow is modeled around the following
[JSON Schema ](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json )
## 1. Define the YAML Config
### This is a sample config for Iceberg using a Glue Catalog:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
* **name**: Enter the catalog name of choice.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
* **awsAccessKeyId**: Enter your secure access key ID for your AWS connection.
* **awsSecretAccessKey**: En ter the Secret Access Key (the passcode key pair to the key ID from above).
* **awsSessionToken (optional)**: Enter the Session Access Token (used if using short lived credentials).
* **awsRegion**: Specify the AWS region used.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
* **databaseName (optional)**: Enter the database name of choice. If not it will be set as 'default'.
{% /codeInfo %}
{% codeInfo srNumber=4 %}
* **ownershipProperty (optional)**: Property to use when searching for the owner. It defaults to 'owner'.
{% /codeInfo %}
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml {% isCodeBlock=true %}
source:
type: iceberg
serviceName: glue_test
serviceConnection:
config:
type: Iceberg
catalog:
```
```yaml {% srNumber=1 %}
name: my_glue
```
```yaml
connection:
```
```yaml {% srNumber=2 %}
awsConfig:
awsAccessKeyId: access key id
awsSecretAccessKey: access secret key
awsRegion: aws region name
```
```yaml {% srNumber=3 %}
databaseName: my_database_name
```
```yaml {% srNumber=4 %}
ownershipProperty: custom_owner_property
```
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeBlock %}
{% /codePreview %}
### This is a sample config for Iceberg using a DynamoDB Catalog:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
* **name**: Enter the catalog name of choice.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
* **tableName**: Enter the name of the table where the Iceberg Catalog is stored.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
* **awsAccessKeyId**: Enter your secure access key ID for your AWS connection.
* **awsSecretAccessKey**: Enter the Secret Access Key (the passcode key pair to the key ID from above).
* **awsSessionToken (optional)**: Enter the Session Access Token (used if using short lived credentials).
* **awsRegion**: Specify the AWS region used.
{% /codeInfo %}
{% codeInfo srNumber=4 %}
* **databaseName (optional)**: Enter the database name of choice. If not it will be set as 'default'.
{% /codeInfo %}
{% codeInfo srNumber=5 %}
* **ownershipProperty (optional)**: Property to use when searching for the owner. It defaults to 'owner'.
{% /codeInfo %}
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml {% isCodeBlock=true %}
source:
type: iceberg
serviceName: glue_test
serviceConnection:
config:
type: Iceberg
catalog:
```
```yaml {% srNumber=1 %}
name: my_dynamo
```
```yaml
connection:
```
```yaml {% srNumber=2 %}
tableName: catalog_table
```
```yaml {% srNumber=3 %}
awsConfig:
awsAccessKeyId: access key id
awsSecretAccessKey: access secret key
awsRegion: aws region name
```
```yaml {% srNumber=4 %}
databaseName: my_database_name
```
```yaml {% srNumber=5 %}
ownershipProperty: custom_owner_property
```
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeBlock %}
{% /codePreview %}
### This is a sample config for Iceberg using a Hive Catalog:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
* **name**: Enter the catalog name of choice.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
* **uri**: Enter the uri to the Hive Metastore. Example: 'thrift://localhost:9083'.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
* **fileSystem (Optional)**: Enter the specific configuration given the file system used to store the Iceberg files.
* **Local**: No configuration needed
* **S3 (Or S3 Compatible)**:
* **awsAccessKeyId**: Enter your secure access key ID for your AWS connection.
* **awsSecretAccessKey**: Enter the Secret Access Key (the passcode key pair to the key ID from above).
* **awsSessionToken (optional)**: Enter the Session Access Token (used if using short lived credentials).
* **awsRegion**: Specify the AWS region used.
* **endPointURL**: EndPoint URL to use with AWS.
* **Azure**:
* **clientId** : Client ID of the data storage account
* **clientSecret** : Client Secret of the account
* **tenantId** : Tenant ID under which the data storage account falls
* **accountName** : Account Name of the data Storage
{% /codeInfo %}
{% codeInfo srNumber=4 %}
* **databaseName (optional)**: Enter the database name of choice. If not it will be set as 'default'.
{% /codeInfo %}
{% codeInfo srNumber=5 %}
* **ownershipProperty (optional)**: Property to use when searching for the owner. It defaults to 'owner'.
{% /codeInfo %}
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml {% isCodeBlock=true %}
source:
type: iceberg
serviceName: glue_test
serviceConnection:
config:
type: Iceberg
catalog:
```
```yaml {% srNumber=1 %}
name: my_hive
```
```yaml {% isCodeBlock=true %}
connection:
```
```yaml {% srNumber=2 %}
uri: thrift://localhost:9083
```
```yaml {% srNumber=3 %}
fileSystem:
# S3 Compatible
awsAccessKeyId: access key id
awsSecretAccessKey: access secret key
awsRegion: aws region name
# Azure
# clientId: client_id
# clientSecret: client_secret
# tenantId: tenant_id
# accountName: account_name
```
```yaml {% srNumber=4 %}
databaseName: my_database_name
```
```yaml {% srNumber=5 %}
ownershipProperty: custom_owner_property
```
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeBlock %}
{% /codePreview %}
### This is a sample config for Iceberg using a REST Catalog:
{% codePreview %}
{% codeInfoContainer %}
#### Source Configuration - Service Connection
{% codeInfo srNumber=1 %}
* **name**: Enter the catalog name of choice.
{% /codeInfo %}
{% codeInfo srNumber=2 %}
* **uri**: Enter the uri to the Rest Catalog. Example: 'http://localhost:8181'.
{% /codeInfo %}
{% codeInfo srNumber=3 %}
* **credential (Optional)**: OAuth2 Credential to use for Authentication flow.
* **clientId**: OAuth2 Client ID.
* **clientSecret**: OAuth2 Client Secret.
{% /codeInfo %}
{% codeInfo srNumber=4 %}
* **token (Optional)**: Enter the Bearer token for the 'Authorization' header.
{% /codeInfo %}
{% codeInfo srNumber=5 %}
* **ssl (Optional)**: Needed configuration for SSL.
* **caCertPath**: CA Certificate Path.
* **clientCertPath**: Client Certificate Path.
* **privateKeyPath**: Private Key Path.
{% /codeInfo %}
{% codeInfo srNumber=6 %}
* **sigv4 (Optional)**: Used if signing requests using AWS SigV4 protocol.
* **signingRegion** : AWS Region used when signing a request.
* **signingName** : Name used to sign the request with.
{% /codeInfo %}
{% codeInfo srNumber=7 %}
* **fileSystem (Optional)**: Enter the specific configuration given the file system used to store the Iceberg files.
* **Local**: No configuration needed
* **S3 (Or S3 Compatible)**:
* **awsAccessKeyId**: Enter your secure access key ID for your AWS connection.
* **awsSecretAccessKey**: Enter the Secret Access Key (the passcode key pair to the key ID from above).
* **awsSessionToken (optional)**: Enter the Session Access Token (used if using short lived credentials).
* **awsRegion**: Specify the AWS region used.
* **endPointURL**: EndPoint URL to use with AWS.
* **Azure**:
* **clientId** : Client ID of the data storage account
* **clientSecret** : Client Secret of the account
* **tenantId** : Tenant ID under which the data storage account falls
* **accountName** : Account Name of the data Storage
{% /codeInfo %}
{% codeInfo srNumber=8 %}
* **databaseName (optional)**: Enter the database name of choice. If not it will be set as 'default'.
{% /codeInfo %}
{% codeInfo srNumber=9 %}
* **warehouseLocation (optional)**: Warehouse Location. Used to specify a custom warehouse location if needed.
Most Catalogs should have a working default warehouse location.
{% /codeInfo %}
{% codeInfo srNumber=10 %}
* **ownershipProperty (optional)**: Property to use when searching for the owner. It defaults to 'owner'.
{% /codeInfo %}
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink-def.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config-def.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeInfoContainer %}
{% codeBlock fileName="filename.yaml" %}
```yaml {% isCodeBlock=true %}
source:
type: iceberg
serviceName: glue_test
serviceConnection:
config:
type: Iceberg
catalog:
```
```yaml {% srNumber=1 %}
name: my_rest
```
```yaml {% isCodeBlock=true %}
connection:
```
```yaml {% srNumber=2 %}
uri: http://localhost:8181
```
```yaml {% srNumber=3 %}
credential:
clientId: client_id
clientSecret: client_secret
```
```yaml {% srNumber=4 %}
token: my_bearer_token
```
```yaml {% srNumber=5 %}
ssl:
caCertPath: ./ca_cert.pem
clientCertPath: ./client_cert.crt
privateKeyPath: ./private.key
```
```yaml {% srNumber=6 %}
sigv4:
signingRegion: us-east-2
signingName: signing_name
```
```yaml {% srNumber=7 %}
fileSystem:
# S3 compatible
awsAccessKeyId: access key id
awsSecretAccessKey: access secret key
awsRegion: aws region name
# Azure
# clientId: client_id
# clientSecret: client_secret
# tenantId: tenant_id
# accountName: account_name
```
```yaml {% srNumber=8 %}
databaseName: my_database_name
```
```yaml {% srNumber=9 %}
warehouseLocation: warehouse_location
```
```yaml {% srNumber=10 %}
ownershipProperty: custom_owner_property
```
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/database/source-config.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-sink.md" /%}
2024-06-18 15:53:06 +02:00
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/workflow-config.md" /%}
2024-06-18 15:53:06 +02:00
{% /codeBlock %}
{% /codePreview %}
2025-04-18 08:42:17 +02:00
{% partial file="/v1.8/connectors/yaml/ingestion-cli.md" /%}
2024-08-07 11:41:00 +05:30
## Securing Rest Catalog Connection with SSL in OpenMetadata
When using `SSL` to establish secure connections between OpenMetadata and Rest Catalog, you can specify the `caCertificate` to provide the CA certificate used for SSL validation. Alternatively, if both client and server require mutual authentication, you'll need to use all three parameters: `ssl_key` , `ssl_cert` , and `ssl_ca` . In this case, `ssl_cert` is used for the client’ s SSL certificate, `ssl_key` for the private key associated with the SSL certificate, and `ssl_ca` for the CA certificate to validate the server’ s certificate.
```yaml
ssl:
caCertPath: ./ca_cert.pem
clientCertPath: ./client_cert.crt
privateKeyPath: ./private.key
```