mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-03 04:10:43 +00:00
Port mce-cli to Java. Also moved off the avro format event file to json instead. Much nicer to use :)
Python ETL examples
ETL scripts written in Python.
Prerequisites
- Before running any python metadata ingestion job, you should make sure that DataHub backend services are all running. The easiest way to do that is through Docker images.
- You also need to build the
mxe-schemasmodule as below.
This is needed to generate./gradlew :metadata-events:mxe-schemas:buildMetadataChangeEvent.avscwhich is the schema forMetadataChangeEventKafka topic. - All the scripts are written using Python 3 and most likely won't work with Python 2.x interpreters.
You can verify the version of your Python using the following command.
We recommend using pyenv to install and manage your Python environment.python --version - Before launching each ETL ingestion pipeline, you can install/verify the library versions as below.
pip install --user -r requirements.txt