mirror of
https://github.com/datahub-project/datahub.git
synced 2025-08-18 22:28:01 +00:00
2.5 KiB
2.5 KiB
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
iceberg |
Data Platform | |
Table | Dataset | An Iceberg table is registered inside a catalog using a name, where the catalog is responsible for creating, dropping and renaming tables. Catalogs manage a collection of tables that are usually grouped into namespaces. The name of a table is mapped to a Dataset name. If a Platform Instance is configured, it will be used as a prefix: <platform_instance>.my.namespace.table . |
Table property | User (a.k.a CorpUser) | The value of a table property can be used as the name of a CorpUser owner. This table property name can be configured with the source option user_ownership_property . |
Table property | CorpGroup | The value of a table property can be used as the name of a CorpGroup owner. This table property name can be configured with the source option group_ownership_property . |
Table parent folders (excluding warehouse catalog location) | Container | Available in a future release |
Table schema | SchemaField | Maps to the fields defined within the Iceberg table schema definition. |
Troubleshooting
Exceptions while increasing processing_threads
Each processing thread will open several files/sockets to download manifest files from blob storage. If you experience
exceptions appearing when increasing processing_threads
configuration parameter, try to increase limit of open
files (i.e. using ulimit
in Linux).
DataHub Iceberg REST Catalog
DataHub also implements the Iceberg REST Catalog. See here for more details.