mirror of
https://github.com/datahub-project/datahub.git
synced 2025-10-14 10:26:51 +00:00
Install From Source
Pre-Requisites
- On MacOS:
brew install librdkafka
- On Debian/Ubuntu:
sudo apt install librdkafka-dev python3-dev python3-venv
Set up python environment (requires python 3.6+)
- python3 -m venv venv
- source venv/bin/activate
- pip install -e .
Testing
Deps
pip install -r test_requirements.txt
Run Unit tests
pytest tests/unit
Run Integration tests
pytest tests/integration
Sanity check code before checkin (currently broken)
flake8 src tests
mypy -p gometa
black --exclude 'gometa/metadata' -S -t py36 src tests
isort --check-only src tests
pytest
Recipes
A recipe is a configuration that tells our ingestion scripts where to pull data from (source) and where to put it (sink). Here's a simple example that pulls metadata from MSSQL and puts it into datahub.
source:
type: mssql
mssql:
username: sa
password: test!Password
database: DemoData
sink:
type: "datahub-rest"
datahub-rest:
server: 'http://localhost:8080'
Running a recipe is quite easy.
gometa-ingest -c ./examples/recipes/kafka_to_datahub_rest.yml
A number of recipes are included in the recipes directory.
Using Docker
Build the image
- source docker/docker_build.sh
Run an ingestion script (examples/recipes/file_to_file.yml)
We have a simple script provided that supports mounting a local directory for input recipes and an output directory for output data
- source docker/docker_run.sh examples/recipes/file_to_file.yml