2.5 KiB

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source Concept DataHub Concept Notes
iceberg Data Platform
Table Dataset An Iceberg table is registered inside a catalog using a name, where the catalog is responsible for creating, dropping and renaming tables. Catalogs manage a collection of tables that are usually grouped into namespaces. The name of a table is mapped to a Dataset name. If a Platform Instance is configured, it will be used as a prefix: <platform_instance>.my.namespace.table.
Table property User (a.k.a CorpUser) The value of a table property can be used as the name of a CorpUser owner. This table property name can be configured with the source option user_ownership_property.
Table property CorpGroup The value of a table property can be used as the name of a CorpGroup owner. This table property name can be configured with the source option group_ownership_property.
Table parent folders (excluding warehouse catalog location) Container Available in a future release
Table schema SchemaField Maps to the fields defined within the Iceberg table schema definition.

Troubleshooting

Exceptions while increasing processing_threads

Each processing thread will open several files/sockets to download manifest files from blob storage. If you experience exceptions appearing when increasing processing_threads configuration parameter, try to increase limit of open files (i.e. using ulimit in Linux).

DataHub Iceberg REST Catalog

DataHub also implements the Iceberg REST Catalog. See here for more details.