mirror of
				https://github.com/open-metadata/OpenMetadata.git
				synced 2025-10-31 10:39:30 +00:00 
			
		
		
		
	
		
			
	
	
		
			202 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			202 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | --- | ||
|  | title: Run Kafka Connector using the CLI | ||
|  | slug: /connectors/messaging/kafka/cli | ||
|  | --- | ||
|  | 
 | ||
|  | # Run Kafka using the metadata CLI
 | ||
|  | 
 | ||
|  | In this section, we provide guides and references to use the Kafka connector. | ||
|  | 
 | ||
|  | Configure and schedule Kafka metadata and profiler workflows from the OpenMetadata UI: | ||
|  | 
 | ||
|  | - [Requirements](#requirements) | ||
|  | - [Metadata Ingestion](#metadata-ingestion) | ||
|  | 
 | ||
|  | ## Requirements
 | ||
|  | 
 | ||
|  | {%inlineCallout icon="description" bold="OpenMetadata 0.12 or later" href="/deployment"%} | ||
|  | To deploy OpenMetadata, check the Deployment guides. | ||
|  | {%/inlineCallout%} | ||
|  | 
 | ||
|  | To run the Ingestion via the UI you'll need to use the OpenMetadata Ingestion Container, which comes shipped with | ||
|  | custom Airflow plugins to handle the workflow deployment. | ||
|  | 
 | ||
|  | ### Python Requirements
 | ||
|  | 
 | ||
|  | To run the Kafka ingestion, you will need to install: | ||
|  | 
 | ||
|  | ```bash | ||
|  | pip3 install "openmetadata-ingestion[kafka]" | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Metadata Ingestion
 | ||
|  | 
 | ||
|  | All connectors are defined as JSON Schemas. | ||
|  | [Here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/entity/services/connections/messaging/kafkaConnection.json) | ||
|  | you can find the structure to create a connection to Kafka. | ||
|  | 
 | ||
|  | In order to create and run a Metadata Ingestion workflow, we will follow | ||
|  | the steps to create a YAML configuration able to connect to the source, | ||
|  | process the Entities if needed, and reach the OpenMetadata server. | ||
|  | 
 | ||
|  | The workflow is modeled around the following | ||
|  | [JSON Schema](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/workflow.json) | ||
|  | 
 | ||
|  | 
 | ||
|  | ### 1. Define the YAML Config
 | ||
|  | 
 | ||
|  | This is a sample config for Kafka: | ||
|  | 
 | ||
|  | {% codePreview %} | ||
|  | 
 | ||
|  | {% codeInfoContainer %} | ||
|  | 
 | ||
|  | #### Source Configuration - Service Connection
 | ||
|  | 
 | ||
|  | {% codeInfo srNumber=1 %} | ||
|  | 
 | ||
|  | **bootstrapServers**: Kafka bootstrap servers.  | ||
|  | 
 | ||
|  | Add them in comma separated values ex: host1:9092,host2:9092. | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | {% codeInfo srNumber=2 %} | ||
|  | 
 | ||
|  | **schemaRegistryURL**: Confluent Kafka Schema Registry URL. URI format. | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | {% codeInfo srNumber=3 %} | ||
|  | 
 | ||
|  | **consumerConfig**: Confluent Kafka Consumer Config. | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | {% codeInfo srNumber=4 %} | ||
|  | 
 | ||
|  | **schemaRegistryConfig**:Confluent Kafka Schema Registry Config. | ||
|  | 
 | ||
|  | **Note:** To ingest the topic schema `schemaRegistryURL` must be passed | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | 
 | ||
|  | 
 | ||
|  | 
 | ||
|  | #### Source Configuration - Source Config
 | ||
|  | 
 | ||
|  | {% codeInfo srNumber=5 %} | ||
|  | 
 | ||
|  | The sourceConfig is defined [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/messagingServiceMetadataPipeline.json): | ||
|  | 
 | ||
|  | **generateSampleData:** Option to turn on/off generating sample data during metadata extraction. | ||
|  | 
 | ||
|  | **topicFilterPattern:** Note that the `topicFilterPattern` supports regex as include or exclude. | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | #### Sink Configuration
 | ||
|  | 
 | ||
|  | {% codeInfo srNumber=6 %} | ||
|  | 
 | ||
|  | To send the metadata to OpenMetadata, it needs to be specified as `type: metadata-rest`. | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | #### Workflow Configuration
 | ||
|  | 
 | ||
|  | {% codeInfo srNumber=7 %} | ||
|  | 
 | ||
|  | The main property here is the `openMetadataServerConfig`, where you can define the host and security provider of your OpenMetadata installation. | ||
|  | 
 | ||
|  | For a simple, local installation using our docker containers, this looks like: | ||
|  | 
 | ||
|  | {% /codeInfo %} | ||
|  | 
 | ||
|  | {% /codeInfoContainer %} | ||
|  | 
 | ||
|  | {% codeBlock fileName="filename.yaml" %} | ||
|  | 
 | ||
|  | 
 | ||
|  | ```yaml | ||
|  | source: | ||
|  |   type: kafka | ||
|  |   serviceName: local_kafka | ||
|  |   serviceConnection: | ||
|  |     config: | ||
|  |       type: Kafka | ||
|  | ``` | ||
|  | ```yaml {% srNumber=1 %} | ||
|  |       bootstrapServers: localhost:9092 | ||
|  | ``` | ||
|  | ```yaml {% srNumber=2 %} | ||
|  |       schemaRegistryURL: http://localhost:8081  # Needs to be a URI | ||
|  | ``` | ||
|  | ```yaml {% srNumber=3 %} | ||
|  |       consumerConfig: {} | ||
|  | ``` | ||
|  | ```yaml {% srNumber=4 %} | ||
|  |       schemaRegistryConfig: {} | ||
|  | ``` | ||
|  | ```yaml {% srNumber=5 %} | ||
|  |   sourceConfig: | ||
|  |     config: | ||
|  |       type: MessagingMetadata | ||
|  |       topicFilterPattern: | ||
|  |         excludes: | ||
|  |           - _confluent.* | ||
|  |         # includes: | ||
|  |         #   - topic1 | ||
|  |       # generateSampleData: true | ||
|  | 
 | ||
|  | ``` | ||
|  | ```yaml {% srNumber=6 %} | ||
|  | sink: | ||
|  |   type: metadata-rest | ||
|  |   config: {} | ||
|  | ``` | ||
|  | 
 | ||
|  | ```yaml {% srNumber=7 %} | ||
|  | workflowConfig: | ||
|  |   openMetadataServerConfig: | ||
|  |     hostPort: "http://localhost:8585/api" | ||
|  |     authProvider: openmetadata | ||
|  |     securityConfig: | ||
|  |       jwtToken: "{bot_jwt_token}" | ||
|  | ``` | ||
|  | 
 | ||
|  | {% /codeBlock %} | ||
|  | 
 | ||
|  | {% /codePreview %} | ||
|  | 
 | ||
|  | ### Workflow Configs for Security Provider
 | ||
|  | 
 | ||
|  | We support different security providers. You can find their definitions [here](https://github.com/open-metadata/OpenMetadata/tree/main/openmetadata-spec/src/main/resources/json/schema/security/client). | ||
|  | 
 | ||
|  | ## Openmetadata JWT Auth
 | ||
|  | 
 | ||
|  | - JWT tokens will allow your clients to authenticate against the OpenMetadata server. To enable JWT Tokens, you will get more details [here](/deployment/security/enable-jwt-tokens). | ||
|  | 
 | ||
|  | ```yaml | ||
|  | workflowConfig: | ||
|  |   openMetadataServerConfig: | ||
|  |     hostPort: "http://localhost:8585/api" | ||
|  |     authProvider: openmetadata | ||
|  |     securityConfig: | ||
|  |       jwtToken: "{bot_jwt_token}" | ||
|  | ``` | ||
|  | 
 | ||
|  | - You can refer to the JWT Troubleshooting section [link](/deployment/security/jwt-troubleshooting) for any issues in your JWT configuration. If you need information on configuring the ingestion with other security providers in your bots, you can follow this doc [link](/deployment/security/workflow-config-auth). | ||
|  | 
 | ||
|  | ### 2. Run with the CLI
 | ||
|  | 
 | ||
|  | First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run: | ||
|  | 
 | ||
|  | ```bash | ||
|  | metadata ingest -c <path-to-yaml> | ||
|  | ``` | ||
|  | 
 | ||
|  | Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, | ||
|  | you will be able to extract metadata from different sources. |