John Plaisted 6ece2d6469
Start adding java ETL examples, starting with kafka etl. (#1805)
Start adding java ETL examples, starting with kafka etl.

We've had a few requests to start providing Java examples rather than Python due to type safety.

I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things.

As we port to Java we'll move examples to contrib.
2020-09-11 13:04:21 -07:00

40 lines
1.3 KiB
Markdown

# Kafka ETL
A small application which reads existing Kafka topics from ZooKeeper, retrieves their schema from the schema registry,
and then fires an MCE for each schema.
## Running the Application
First, ensure that services this depends on, like schema registry / zookeeper / mce-consumer-job / gms / etc, are all
running.
This application can be run via gradle:
```
./gradlew :metadata-ingestion-examples:kafka-etl:bootRun
```
Or by building and running the jar:
```
./gradlew :metadata-ingestion-examples:kafka-etl:build
java -jar metadata-ingestion-examples/kafka-etl/build/libs/kafka-etl.jar
```
### Environment Variables
See the files under `src/main/java/com/linkedin/metadata/examples/kafka/config` for a list of customizable spring
environment variables.
### Common pitfalls
For events to be fired correctly, schemas must exist in the schema registry. If a topic was newly created, but no schema
has been registered for it yet, this application will fail to retrieve the schema for that topic. Check the output of
the application to see if this happens. If you see a message like
```
io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject not found.; error code: 40401
```
Then the odds are good that you need to register the schema for this topic.