4.8 KiB
title | hide_title |
---|---|
Configuring Kafka | true |
How to configure Kafka?
With the exception of KAFKA_BOOTSTRAP_SERVER
and KAFKA_SCHEMAREGISTRY_URL
, Kafka is configured via spring-boot, specifically with KafkaProperties. See Integration Properties prefixed with spring.kafka
.
Below is an example of how SASL/GSSAPI properties can be configured via environment variables:
export KAFKA_BOOTSTRAP_SERVER=broker:29092
export KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
export SPRING_KAFKA_PROPERTIES_SASL_KERBEROS_SERVICE_NAME=kafka
export SPRING_KAFKA_PROPERTIES_SECURITY_PROTOCOL=SASL_PLAINTEXT
export SPRING_KAFKA_PROPERTIES_SASL_JAAS_CONFIG=com.sun.security.auth.module.Krb5LoginModule required principal='principal@REALM' useKeyTab=true storeKey=true keyTab='/keytab';
These properties can be specified via application.properties
or application.yml
files, or as command line switches, or as environment variables. See Spring's Externalized Configuration to see how this works.
See Kafka Connect Security for more ways to connect.
DataHub components that connect to Kafka are currently:
- mce-consumer-job
- mae-consumer-job
- gms
- Various ingestion example apps
Configuring Topic Names
By default, ingestion relies upon the MetadataChangeEvent_v4
, MetadataAuditEvent_v4
, and FailedMetadataChangeEvent
kafka topics by default for
metadata events.
We've included environment variables to customize the name each of these topics, if your company or organization has naming rules for your topics.
datahub-gms
METADATA_CHANGE_EVENT_NAME
: The name of the metadata change event topic.METADATA_AUDIT_EVENT_NAME
: The name of the metadata audit event topic.FAILED_METADATA_CHANGE_EVENT_NAME
: The name of the failed metadata change event topic.
datahub-mce-consumer
KAFKA_MCE_TOPIC_NAME
: The name of the metadata change event topic.KAFKA_FMCE_TOPIC_NAME
: The name of the failed metadata change event topic.
datahub-mae-consumer
KAFKA_TOPIC_NAME
: The name of the metadata audit event topic.
Please ensure that these environment variables are set consistently throughout your ecosystem. DataHub has a few different applications running which communicate with Kafka (see above).
Configuring Consumer Group Id
Kafka Consumers in Spring are configured using Kafka listeners. By default, consumer group id is same as listener id.
We've included an environment variable to customize the consumer group id, if your company or organization has specific naming rules.
datahub-mce-consumer and datahub-mae-consumer
KAFKA_CONSUMER_GROUP_ID
: The name of the kafka consumer's group id.
How to apply configuration?
- For quickstart, add these environment variables to the corresponding application's docker.env
- For helm charts, add these environment variables as extraEnvs to the corresponding application's chart. For example,
extraEnvs:
- name: METADATA_CHANGE_EVENT_NAME
value: "MetadataChangeEvent"
- name: METADATA_AUDIT_EVENT_NAME
value: "MetadataAuditEvent"
- name: FAILED_METADATA_CHANGE_EVENT_NAME
value: "FailedMetadataChangeEvent"
- name: KAFKA_CONSUMER_GROUP_ID
value: "my-apps-mae-consumer"
SSL
We are using the Spring Boot framework to start our apps, including setting up Kafka. You can use environment variables to set system properties, including Kafka properties. From there you can set your SSL configuration for Kafka.
If Schema Registry is configured to use security (SSL), then you also need to set this config.
Note
In the logs you might see something like
The configuration 'kafkastore.ssl.truststore.password' was supplied but isn't a known config.
The configuration is not a configuration required for the producer. These WARN message can be safely ignored. Each of Datahub services are passed a full set of configuration but may not require all the configurations that are passed to them. These warn messages indicate that the service was passed a configuration that is not relevant to it and can be safely ignored.