datahub/contrib/metadata-ingestion
Liangjun Jiang ec5fbbc496
Add openldap-etl script and instruction (#1647)
* add openldap-etl scripts, README.md and docker compose file

* remove a misleading comment

* fixed a typo in README.md

Co-authored-by: Liangjun <liajiang@expediagroup.com>
2020-04-24 14:28:39 -07:00
..
2020-04-21 20:29:19 -07:00
2020-03-18 07:29:47 -07:00
2020-03-25 21:54:22 -07:00
2020-04-21 20:29:19 -07:00

datahub Ingestion Tool

Introduction

some tool to ingestion [jdbc-database-schema] and [etl-lineage] metadata.

i split the ingestion procedure to two part: [datahub-producer] and different [metadata-generator]

Roadmap

  • datahub-producer load json avro data.
  • add lineage-hive generator
  • add dataset-jdbc generator[include [mysql, mssql, postgresql, oracle] driver]
  • add dataset-hive generator
  • *> add lineage-oracle generator
  • enhance lineage-jdbc generator to lazy iterator mode.
  • enchance avro parser to show error information

Quickstart

  1. install nix and channel
  sudo install -d -m755 -o $(id -u) -g $(id -g) /nix
  curl https://nixos.org/nix/install | sh
  
  nix-channel --add https://nixos.org/channels/nixos-20.03 nixpkgs
  nix-channel --update nixpkgs
  1. [optional] you can download specified dependency in advanced, or it will automatically download at run time.
  nix-shell bin/[datahub-producer].hs.nix
  nix-shell bin/[datahub-producer].py.nix
  ...
  1. load json data to datahub
    cat sample/mce.json.dat | bin/datahub-producer.hs config
  1. parse hive sql to datahub
    ls sample/hive_*.sql | bin/lineage_hive_generator.hs | bin/datahub-producer.hs config
  1. load jdbc schema(mysql, mssql, postgresql, oracle) to datahub
    bin/dataset-jdbc-generator.hs | bin/datahub-producer.hs config
  1. load hive schema to datahub
    bin/dataset-hive-generator.py | bin/datahub-producer.hs config

Reference