DataHub
Introduction
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our LinkedIn blog post and Strata presentation. You should also visit DataHub Architecture to get a better understanding of how DataHub is implemented and DataHub Onboarding Guide to understand how to extend DataHub for your own use case. This repository contains the complete source code to be able to build DataHub's frontend & backend services.
Quickstart
- Install docker and docker-compose.
- Clone this repo.
- Open Docker either from the command line or the Desktop app and ensure it is up and running then
cdinto the cloneddatahubrepo. - Run below command to download and run all Docker containers in your local:
cd docker/quickstart && docker-compose pull && docker-compose up --build
-
After you have all Docker containers running in your machine:
Switch to a new terminal,
cdinto the clone repo and run below command to ingest provided sample data to DataHub:
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
Note : If ingestion command is not run, you may not have enough sample data to explore the application and its features.
- Finally, you can start
DataHubby opening http://localhost:9001 in your browser. You can sign in usingdatahubas both username and password.
Refer to debugging guide if you have issues in any of the above steps.
Quicklinks
- DataHub Architecture
- DataHub Onboarding Guide
- Docker Images
- Frontend App
- Generalized Metadata Service
- Metadata Consumer Jobs
- Metadata Ingestion
Releases
See Releases page for more details.
Roadmap
- Kubernetes for container orchestration
- Deploy DataHub to Azure Cloud
