2021-08-08 16:40:51 -04:00
|
|
|
# Kafka Connect
|
|
|
|
|
|
|
|
For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md).
|
|
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
|
|
To install this plugin, run `pip install 'acryl-datahub[kafka-connect]'`.
|
|
|
|
|
|
|
|
## Capabilities
|
|
|
|
|
|
|
|
This plugin extracts the following:
|
|
|
|
|
|
|
|
- Kafka Connect connector as individual `DataFlowSnapshotClass` entity
|
|
|
|
- Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming
|
|
|
|
- Lineage information between source database to Kafka topic
|
|
|
|
|
|
|
|
Current limitations:
|
|
|
|
|
2021-11-18 06:48:37 +05:30
|
|
|
- works only for
|
|
|
|
- JDBC and Debezium source connectors
|
|
|
|
- BigQuery sink connector
|
2021-08-08 16:40:51 -04:00
|
|
|
|
2022-01-27 15:31:25 -08:00
|
|
|
| Capability | Status | Details |
|
|
|
|
| -----------| ------ | ---- |
|
|
|
|
| Platform Instance | ✔️ | [link](../../docs/platform-instances.md) |
|
|
|
|
|
|
|
|
|
2021-08-08 16:40:51 -04:00
|
|
|
## Quickstart recipe
|
|
|
|
|
|
|
|
Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options.
|
|
|
|
|
|
|
|
For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes).
|
|
|
|
|
|
|
|
```yml
|
|
|
|
source:
|
|
|
|
type: "kafka-connect"
|
|
|
|
config:
|
|
|
|
# Coordinates
|
|
|
|
connect_uri: "http://localhost:8083"
|
|
|
|
cluster_name: "connect-cluster"
|
2021-11-18 06:48:37 +05:30
|
|
|
provided_configs:
|
|
|
|
- provider: env
|
|
|
|
path_key: MYSQL_CONNECTION_URL
|
|
|
|
value: jdbc:mysql://test_mysql:3306/librarydb
|
2022-01-27 15:31:25 -08:00
|
|
|
# Optional mapping of platform types to instance ids
|
|
|
|
platform_instance_map: # optional
|
|
|
|
mysql: test_mysql # optional
|
2021-08-08 16:40:51 -04:00
|
|
|
|
|
|
|
# Credentials
|
|
|
|
username: admin
|
|
|
|
password: password
|
|
|
|
|
|
|
|
sink:
|
|
|
|
# sink configs
|
|
|
|
```
|
|
|
|
|
|
|
|
## Config details
|
|
|
|
|
|
|
|
Note that a `.` is used to denote nested fields in the YAML recipe.
|
|
|
|
|
|
|
|
| Field | Required | Default | Description |
|
|
|
|
| -------------------------- | -------- | -------------------------- | ------------------------------------------------------- |
|
2021-11-18 06:48:37 +05:30
|
|
|
| `connect_uri` | ✅ | `"http://localhost:8083/"` | URI to connect to. |
|
2021-08-08 16:40:51 -04:00
|
|
|
| `username` | | | Kafka Connect username. |
|
|
|
|
| `password` | | | Kafka Connect password. |
|
|
|
|
| `cluster_name` | | `"connect-cluster"` | Cluster to ingest from. |
|
2021-11-18 06:48:37 +05:30
|
|
|
| `provided_configs` | | | Provided Configurations |
|
2021-08-29 18:33:42 +03:00
|
|
|
| `construct_lineage_workunits` | | `True` | Whether to create the input and output Dataset entities |
|
2021-08-10 13:35:57 -04:00
|
|
|
| `connector_patterns.deny` | | | List of regex patterns for connectors to include in ingestion. |
|
|
|
|
| `connector_patterns.allow` | | | List of regex patterns for connectors to exclude from ingestion. |
|
2021-08-11 11:50:38 -04:00
|
|
|
| `connector_pattern.ignoreCase` | | `True` | Whether to ignore case sensitivity during pattern matching. |
|
2021-08-08 16:40:51 -04:00
|
|
|
| `env` | | `"PROD"` | Environment to use in namespace when constructing URNs. |
|
2022-01-27 15:31:25 -08:00
|
|
|
| `platform_instance_map` | | | Platform instance mapping to use when constructing URNs. e.g.`platform_instance_map: { "hive": "warehouse" }` |
|
2021-08-08 16:40:51 -04:00
|
|
|
|
|
|
|
## Compatibility
|
|
|
|
|
|
|
|
Coming soon!
|
|
|
|
|
|
|
|
## Questions
|
|
|
|
|
|
|
|
If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)!
|