yujunjun/datahub

Fork 0

mirror of https://github.com/datahub-project/datahub.git synced 2025-10-31 18:59:23 +00:00

History

Harshal Sheth fd9bc09e67 Start adding reporting

2021-02-15 18:29:27 -08:00

.github/workflows

Enable bare pytest

2021-02-15 18:29:27 -08:00

docker

adding docker commands

2021-02-15 18:29:27 -08:00

recipes

adding commented out example to yaml file

2021-02-15 18:29:27 -08:00

scripts

Codegen avro + datahub kafka sink (#3 )

2021-02-15 18:29:27 -08:00

src/gometa

Start adding reporting

2021-02-15 18:29:27 -08:00

tests

Add test to check that classes are not abstract

2021-02-15 18:29:27 -08:00

.dockerignore

Firstdrop of ingest (#1 )

2021-02-15 18:29:27 -08:00

.gitignore

Codegen avro + datahub kafka sink (#3 )

2021-02-15 18:29:27 -08:00

CHANGELOG

Firstdrop of ingest (#1 )

2021-02-15 18:29:27 -08:00

LICENSE

Initial commit

2021-02-15 18:29:27 -08:00

README.md

Update README.md

2021-02-15 18:29:27 -08:00

setup.cfg

Enable bare pytest

2021-02-15 18:29:27 -08:00

setup.py

Adding 3.9 to setup.py classifiers

2021-02-15 18:29:27 -08:00

test_requirements.txt

Start using avro producer

2021-02-15 18:29:27 -08:00

README.md

Install From Source

Pre-Requisites

On MacOS: brew install librdkafka
On Debian/Ubuntu: sudo apt install librdkafka-dev

Set up python environment (requires python 3.7+)

python3 -m venv venv
source venv/bin/activate
pip install -e .

Run tests

pip install -r test_requirements.txt

Run Unit tests

pytest tests/unit

Run Integration tests

pytest tests/integration

Sanity check code before checkin (currently broken)

flake8 src test && mypy -p gometa && black --check -l 120 src test && isort --check-only src test && pytest

Recipes

A recipe is a configuration that tells our ingestion scripts where to pull data from (source) and where to put it (sink). Here's a simple example that pulls metadata from MSSQL and puts it into datahub.

source:
  type: mssql
  mssql:
    username: sa
    password: test!Password
    database: DemoData

sink:
  type: "datahub-rest"
  datahub-rest:
    server: 'http://localhost:8080'

Running a recipe is quite easy.

gometa-ingest -c ./recipes/kafka_to_datahub_rest.yaml

A number of recipes are included in the recipes directory.

Using Docker

Build the image

source docker/docker_build.sh

Run an ingestion script (recipes/file_to_file.yml)

We have a simple script provided that supports mounting a local directory for input recipes and an output directory for output data

source docker/docker_run.sh recipes/file_to_file.yml