mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 07:34:44 +00:00

- Move non-README top-level MD files to /docs - Update all absolute links to files in /docs to relative links - Add a placeholder front page for GitHub Pages
1.5 KiB
1.5 KiB
How to onboard a new data source?
In the metadata-ingestion, DataHub provides various kinds of metadata sources onboarding, including Hive, Kafka, LDAP, mySQL, and generic RDBMS as ETL scripts to feed the metadata to the GMS.
1. Extract
The extract process will be specific tight to the data source, hence, the data accessor should be able to reflect the correctness of the metadata from underlying data platforms.
2. Transform
In the transform stage, the extracted metadata should be encapsulated in a valid MetadataChangeEvent under the defined aspects and snapshots.
3. Load
The load part will leverage the Kafka producer to enable the pub-sub event-based ingestion. Meanwhile, the schema validation will be involved to check metadata quality.