| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | # Kafka Connect
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For context on getting started with ingestion, check out our [metadata ingestion guide](../README.md). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Setup
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To install this plugin, run `pip install 'acryl-datahub[kafka-connect]'`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Capabilities
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This plugin extracts the following: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - Kafka Connect connector as individual `DataFlowSnapshotClass` entity | 
					
						
							|  |  |  | - Creating individual `DataJobSnapshotClass` entity using `{connector_name}:{source_dataset}` naming | 
					
						
							|  |  |  | - Lineage information between source database to Kafka topic | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Current limitations: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-11-18 06:48:37 +05:30
										 |  |  | - works only for  | 
					
						
							|  |  |  |   - JDBC and Debezium source connectors | 
					
						
							|  |  |  |   - BigQuery sink connector | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-01-27 15:31:25 -08:00
										 |  |  | | Capability | Status | Details |  | 
					
						
							|  |  |  | | -----------| ------ | ---- | | 
					
						
							|  |  |  | | Platform Instance | ✔️ | [link](../../docs/platform-instances.md) | | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | ## Quickstart recipe
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Check out the following recipe to get started with ingestion! See [below](#config-details) for full configuration options. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For general pointers on writing and running a recipe, see our [main recipe guide](../README.md#recipes). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```yml | 
					
						
							|  |  |  | source: | 
					
						
							|  |  |  |   type: "kafka-connect" | 
					
						
							|  |  |  |   config: | 
					
						
							|  |  |  |     # Coordinates | 
					
						
							|  |  |  |     connect_uri: "http://localhost:8083" | 
					
						
							|  |  |  |     cluster_name: "connect-cluster" | 
					
						
							| 
									
										
										
										
											2021-11-18 06:48:37 +05:30
										 |  |  |     provided_configs:      | 
					
						
							|  |  |  |       - provider: env | 
					
						
							|  |  |  |         path_key: MYSQL_CONNECTION_URL | 
					
						
							|  |  |  |         value: jdbc:mysql://test_mysql:3306/librarydb | 
					
						
							| 
									
										
										
										
											2022-01-27 15:31:25 -08:00
										 |  |  |     # Optional mapping of platform types to instance ids | 
					
						
							|  |  |  |     platform_instance_map: # optional | 
					
						
							|  |  |  |       mysql: test_mysql    # optional | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  |     # Credentials | 
					
						
							|  |  |  |     username: admin | 
					
						
							|  |  |  |     password: password | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | sink: | 
					
						
							|  |  |  |   # sink configs | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Config details
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note that a `.` is used to denote nested fields in the YAML recipe. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | Field                      | Required | Default                    | Description                                             | | 
					
						
							|  |  |  | | -------------------------- | -------- | -------------------------- | ------------------------------------------------------- | | 
					
						
							| 
									
										
										
										
											2021-11-18 06:48:37 +05:30
										 |  |  | | `connect_uri`              |    ✅    | `"http://localhost:8083/"` | URI to connect to.                                      | | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | | `username`                 |          |                            | Kafka Connect username.                                 | | 
					
						
							|  |  |  | | `password`                 |          |                            | Kafka Connect password.                                 | | 
					
						
							|  |  |  | | `cluster_name`             |          | `"connect-cluster"`        | Cluster to ingest from.                                 | | 
					
						
							| 
									
										
										
										
											2021-11-18 06:48:37 +05:30
										 |  |  | | `provided_configs`         |          |                            | Provided Configurations                                 | | 
					
						
							| 
									
										
										
										
											2021-08-29 18:33:42 +03:00
										 |  |  | | `construct_lineage_workunits`    |    | `True`                     | Whether to create the input and output Dataset entities | | 
					
						
							| 
									
										
										
										
											2021-08-10 13:35:57 -04:00
										 |  |  | | `connector_patterns.deny`  |          |                            | List of regex patterns for connectors to include in ingestion.   | | 
					
						
							|  |  |  | | `connector_patterns.allow` |          |                            | List of regex patterns for connectors to exclude from ingestion. | | 
					
						
							| 
									
										
										
										
											2021-08-11 11:50:38 -04:00
										 |  |  | | `connector_pattern.ignoreCase`  |     | `True`      | Whether to ignore case sensitivity during pattern matching.            | | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | | `env`                      |          | `"PROD"`                   | Environment to use in namespace when constructing URNs. | | 
					
						
							| 
									
										
										
										
											2022-01-27 15:31:25 -08:00
										 |  |  | | `platform_instance_map` |     |     | Platform instance mapping to use when constructing URNs. e.g.`platform_instance_map: { "hive": "warehouse" }` | | 
					
						
							| 
									
										
										
										
											2021-08-08 16:40:51 -04:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Compatibility
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Coming soon! | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Questions
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you've got any questions on configuring this source, feel free to ping us on [our Slack](https://slack.datahubproject.io/)! |