mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-15 10:52:41 +00:00
2.3 KiB
2.3 KiB
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
| Source Concept | DataHub Concept | Notes |
|---|---|---|
iceberg |
Data Platform | |
| Table | Dataset | An Iceberg table is registered inside a catalog using a name, where the catalog is responsible for creating, dropping and renaming tables. Catalogs manage a collection of tables that are usually grouped into namespaces. The name of a table is mapped to a Dataset name. If a Platform Instance is configured, it will be used as a prefix: <platform_instance>.my.namespace.table. |
| Table property | User (a.k.a CorpUser) | The value of a table property can be used as the name of a CorpUser owner. This table property name can be configured with the source option user_ownership_property. |
| Table property | CorpGroup | The value of a table property can be used as the name of a CorpGroup owner. This table property name can be configured with the source option group_ownership_property. |
| Table parent folders (excluding warehouse catalog location) | Container | Available in a future release |
| Table schema | SchemaField | Maps to the fields defined within the Iceberg table schema definition. |
Troubleshooting
Exceptions while increasing processing_threads
Each processing thread will open several files/sockets to download manifest files from blob storage. If you experience
exceptions appearing when increasing processing_threads configuration parameter, try to increase limit of open
files (i.e. using ulimit in Linux).