
* add job info as aspect of a dataset * add job urn def., aspect and entity * job entity with upstream and downstream lineage * use job urn in upstream & downstream * add Job entity rest APIs * rest.li api, impl and factory for job entity * code cleanup * use pdl; onboard data process entity * add es index json * fix gradlew build ignored tasks * add a comment about data process info field * fix style warning issues * update content based on PR * checked in generated snapshot json * updated based on PR feedback * update data process data format * updated based on code review feedback * revert back gms & mce-job docker image * delete temp files * update based pr feedback * file name and a typo * format with linkedin style Co-authored-by: Liangjun <liajiang@expediagroup.com>
Docker Images
The easiest way to bring up and test DataHub is using DataHub Docker images which are continuously deployed to Docker Hub with every commit to repository.
- linkedin/datahub-gms
- linkedin/datahub-frontend
- linkedin/datahub-mae-consumer
- linkedin/datahub-mce-consumer
Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are generated via Dockerbuild files or how to start each container using Docker Compose. Other than these, DataHub depends on below Docker images to be able to run:
Local-built ingestion image allows you to create on an ad-hoc basis metadatachangeevent
with Python script.
The pipeline depends on all the above images composing up.
Prerequisites
You need to install docker and docker-compose.
Quickstart
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check Quickstart Guide.