mirror of
https://github.com/datahub-project/datahub.git
synced 2025-08-21 15:48:05 +00:00

* fix(security): commons-text in frontend * refactor(restli): set threads based on cpu cores feat(mce-consumers): hit local restli endpoint * testing docker build * Add retry configuration options for entity client * Kafka debugging * fix(kafka-setup): parallelize topic creation * Adjust docker build * Docker build updates * WIP * fix(lint): metadata-ingestion lint * fix(gradle-docker): fix docker frontend dep * fix(elastic): fix race condition between gms and mae for index creation * Revert "fix(elastic): fix race condition between gms and mae for index creation" This reverts commit 9629d12c3bdb3c0dab87604d409ca4c642c9c6d3. * fix(test): fix datahub frontend test for clean/test cycle * fix(test): datahub-frontend missing assets in test * fix(security): set protobuf lib datahub-upgrade & mce/mae-consumer * gitingore update * fix(docker): remove platform on docker base image, set by buildx * refactor(kafka-producer): update kafka producer tracking/logging * updates per PR feedback * Add documentation around mce standalone consumer Kafka consumer concurrency to follow thread count for restli & sql connection pool Co-authored-by: leifker <dleifker@gmail.com> Co-authored-by: Pedro Silva <pedro@acryl.io>
MXE Processing Jobs
DataHub uses Kafka as the pub-sub message queue in the backend. There are 2 Kafka topics used by DataHub which are
MetadataChangeEvent
and MetadataAuditEvent
.
MetadataChangeEvent:
This message is emitted by any data platform or crawler in which there is a change in the metadata.MetadataAuditEvent:
This message is emitted by DataHub GMS to notify that metadata change is registered.
To be able to consume from these two topics, there are two Spring jobs DataHub uses:
- MCE Consumer Job: Writes to DataHub GMS
- MAE Consumer Job: Writes to Elasticsearch & Neo4j