mirror of
https://github.com/datahub-project/datahub.git
synced 2025-07-04 23:57:03 +00:00
37 lines
2.8 KiB
Markdown
37 lines
2.8 KiB
Markdown
## Integration Details
|
|
|
|
<!-- Plain-language description of what this integration is meant to do. -->
|
|
<!-- Include details about where metadata is extracted from (ie. logs, source API, manifest, etc.) -->
|
|
|
|
The Datahub Pulsar source plugin extracts `topic` and `schema` metadata from an Apache Pulsar instance and ingest the information into Datahub. The plugin uses the [Pulsar admin Rest API interface](https://pulsar.apache.org/admin-rest-api/#) to interact with the Pulsar instance. The following APIs are used in order to:
|
|
|
|
- [Get the list of existing tenants](https://pulsar.apache.org/admin-rest-api/#tag/tenants)
|
|
- [Get the list of namespaces associated with each tenant](https://pulsar.apache.org/admin-rest-api/#tag/namespaces)
|
|
- [Get the list of topics associated with each namespace](https://pulsar.apache.org/admin-rest-api/#tag/persistent-topic)
|
|
- persistent topics
|
|
- persistent partitioned topics
|
|
- non-persistent topics
|
|
- non-persistent partitioned topics
|
|
- [Get the latest schema associated with each topic](https://pulsar.apache.org/admin-rest-api/#tag/schemas)
|
|
|
|
The data is extracted on `tenant` and `namespace` basis, topics with corresponding schema (if available) are ingested as [Dataset](docs/generated/metamodel/entities/dataset.md) into Datahub. Some additional values like `schema description`, `schema_version`, `schema_type` and `partitioned` are included as `DatasetProperties`.
|
|
|
|
### Concept Mapping
|
|
|
|
<!-- This should be a manual mapping of concepts from the source to the DataHub Metadata Model -->
|
|
<!-- Authors should provide as much context as possible about how this mapping was generated, including assumptions made, known shortcuts, & any other caveats -->
|
|
|
|
This ingestion source maps the following Source System Concepts to DataHub Concepts:
|
|
|
|
<!-- Remove all unnecessary/irrelevant DataHub Concepts -->
|
|
|
|
| Source Concept | DataHub Concept | Notes |
|
|
| -------------- | ------------------------------------------------------------------ | ------------------------------------------------------------------------- |
|
|
| `pulsar` | [Data Platform](docs/generated/metamodel/entities/dataPlatform.md) | |
|
|
| Pulsar Topic | [Dataset](docs/generated/metamodel/entities/dataset.md) | _subType_: `topic` |
|
|
| Pulsar Schema | [SchemaField](docs/generated/metamodel/entities/schemaField.md) | Maps to the fields defined within the `Avro` or `JSON` schema definition. |
|
|
|
|
## Metadata Ingestion Quickstart
|
|
|
|
For context on getting started with ingestion, check out our [metadata ingestion guide](../../../../metadata-ingestion/README.md).
|