mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-27 09:58:14 +00:00
Getting Started
From source:
Pre-Requisites
- Python 3.6+
- On MacOS:
brew install librdkafka - On Debian/Ubuntu:
sudo apt install librdkafka-dev python3-dev python3-venv
Set up python environment
python3 -m venv venv
source venv/bin/activate
pip install -e .
Usage - from source
gometa-ingest -c examples/recipes/file_to_file.yml
Using Docker:
Build the image
source docker/docker_build.sh
Usage - with Docker
We have a simple script provided that supports mounting a local directory for input recipes and an output directory for output data:
source docker/docker_run.sh examples/recipes/file_to_file.yml
Recipes
A recipe is a configuration that tells our ingestion scripts where to pull data from (source) and where to put it (sink). Here's a simple example that pulls metadata from MSSQL and puts it into datahub.
# A sample recipe that pulls metadata from MSSQL and puts it into DataHub
# using the Rest API.
source:
type: mssql
mssql:
username: sa
password: test!Password
database: DemoData
sink:
type: "datahub-rest"
datahub-rest:
server: 'http://localhost:8080'
Running a recipe is quite easy.
gometa-ingest -c ./examples/recipes/mssql_to_datahub.yml
A number of recipes are included in the examples/recipes directory.
Sources
TODO
Sinks
TODO
Contributing
Contributions welcome!
Testing
# Follow standard install procedure - see above.
# Install requirements.
pip install -r test_requirements.txt
# Run unit tests.
pytest tests/unit
# Run integration tests.
# Note: the integration tests require docker.
pytest tests/integration
Sanity check code before checkin
flake8 src tests
mypy -p gometa
black --exclude 'gometa/metadata' -S -t py36 src tests
isort src tests
pytest