mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-03 23:28:11 +00:00
Integration Details
The Datahub Pulsar source plugin extracts topic
and schema
metadata from an Apache Pulsar instance and ingest the information into Datahub. The plugin uses the Pulsar admin Rest API interface to interact with the Pulsar instance. The following APIs are used in order to:
- Get the list of existing tenants
- Get the list of namespaces associated with each tenant
- Get the list of topics associated with each namespace
- persistent topics
- persistent partitioned topics
- non-persistent topics
- non-persistent partitioned topics
- Get the latest schema associated with each topic
The data is extracted on tenant
and namespace
basis, topics with corresponding schema (if available) are ingested as Dataset into Datahub. Some additional values like schema description
, schema_version
, schema_type
and partitioned
are included as DatasetProperties
.
Concept Mapping
This ingestion source maps the following Source System Concepts to DataHub Concepts:
Source Concept | DataHub Concept | Notes |
---|---|---|
pulsar |
Data Platform | |
Pulsar Topic | Dataset | subType: topic |
Pulsar Schema | SchemaField | Maps to the fields defined within the Avro or JSON schema definition. |
Metadata Ingestion Quickstart
For context on getting started with ingestion, check out our metadata ingestion guide.