John Plaisted 6ece2d6469
Start adding java ETL examples, starting with kafka etl. (#1805)
Start adding java ETL examples, starting with kafka etl.

We've had a few requests to start providing Java examples rather than Python due to type safety.

I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things.

As we port to Java we'll move examples to contrib.
2020-09-11 13:04:21 -07:00
..

Python ETL examples

ETL scripts written in Python.

Prerequisites

  1. Before running any python metadata ingestion job, you should make sure that DataHub backend services are all running. The easiest way to do that is through Docker images.
  2. You also need to build the mxe-schemas module as below.
    ./gradlew :metadata-events:mxe-schemas:build
    
    This is needed to generate MetadataChangeEvent.avsc which is the schema for MetadataChangeEvent Kafka topic.
  3. All the scripts are written using Python 3 and most likely won't work with Python 2.x interpreters. You can verify the version of your Python using the following command.
    python --version
    
    We recommend using pyenv to install and manage your Python environment.
  4. Before launching each ETL ingestion pipeline, you can install/verify the library versions as below.
    pip install --user -r requirements.txt