mirror of
https://github.com/datahub-project/datahub.git
synced 2025-09-02 22:03:11 +00:00
2.3 KiB
2.3 KiB
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
iceberg |
Data Platform | |
Table | Dataset | An Iceberg table is registered inside a catalog using a name, where the catalog is responsible for creating, dropping and renaming tables. Catalogs manage a collection of tables that are usually grouped into namespaces. The name of a table is mapped to a Dataset name. If a Platform Instance is configured, it will be used as a prefix: <platform_instance>.my.namespace.table . |
Table property | User (a.k.a CorpUser) | The value of a table property can be used as the name of a CorpUser owner. This table property name can be configured with the source option user_ownership_property . |
Table property | CorpGroup | The value of a table property can be used as the name of a CorpGroup owner. This table property name can be configured with the source option group_ownership_property . |
Table parent folders (excluding warehouse catalog location) | Container | Available in a future release |
Table schema | SchemaField | Maps to the fields defined within the Iceberg table schema definition. |
Troubleshooting
Exceptions while increasing processing_threads
Each processing thread will open several files/sockets to download manifest files from blob storage. If you experience
exceptions appearing when increasing processing_threads
configuration parameter, try to increase limit of open
files (i.e. using ulimit
in Linux).