mirror of
https://github.com/datahub-project/datahub.git
synced 2025-11-30 20:56:03 +00:00
Start adding java ETL examples, starting with kafka etl. We've had a few requests to start providing Java examples rather than Python due to type safety. I've also started to add these to metadata-ingestion-examples to make it clearer these are *examples*. They can be used directly or as a basis for other things. As we port to Java we'll move examples to contrib.
23 lines
980 B
Markdown
23 lines
980 B
Markdown
# Python ETL examples
|
|
|
|
ETL scripts written in Python.
|
|
|
|
## Prerequisites
|
|
|
|
1. Before running any python metadata ingestion job, you should make sure that DataHub backend services are all running.
|
|
The easiest way to do that is through [Docker images](../../docker).
|
|
2. You also need to build the `mxe-schemas` module as below.
|
|
```
|
|
./gradlew :metadata-events:mxe-schemas:build
|
|
```
|
|
This is needed to generate `MetadataChangeEvent.avsc` which is the schema for `MetadataChangeEvent` Kafka topic.
|
|
3. All the scripts are written using Python 3 and most likely won't work with Python 2.x interpreters.
|
|
You can verify the version of your Python using the following command.
|
|
```
|
|
python --version
|
|
```
|
|
We recommend using [pyenv](https://github.com/pyenv/pyenv) to install and manage your Python environment.
|
|
4. Before launching each ETL ingestion pipeline, you can install/verify the library versions as below.
|
|
```
|
|
pip install --user -r requirements.txt
|
|
``` |