mirror of
https://github.com/datahub-project/datahub.git
synced 2025-10-25 07:54:37 +00:00
DataHub
Introduction
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our LinkedIn blog post and Strata presentation. You should also visit DataHub Architecture to get a better understanding of how DataHub is implemented and DataHub Onboarding Guide to understand how to extend DataHub for your own use case. This repository contains the complete source code to be able to build DataHub's frontend & backend services.
Quickstart
- Install docker and docker-compose.
- Clone this repo and make sure you are at the
datahubbranch. - Run below command to download and run all Docker containers in your local:
cd docker/quickstart && docker-compose pull && docker-compose up --build
- After you have all Docker containers running in your machine, run below command to ingest provided sample data to DataHub:
./gradlew :metadata-events:mxe-schemas:build && cd metadata-ingestion/mce-cli && pip install --user -r requirements.txt && python mce_cli.py produce -d bootstrap_mce.dat
Note: Make sure that you're using Java 8, we have a strict dependency to Java 8 for build.
- Finally, you can start
DataHubby typinghttp://localhost:9001in your browser. You can sign in withdatahubas username and password.
Quicklinks
- DataHub Architecture
- DataHub Onboarding Guide
- Docker Images
- Frontend App
- Generalized Metadata Service
- Metadata Consumer Jobs
- Metadata Ingestion
Roadmap
- Add user profile page
- Deploy DataHub to Azure Cloud
Description
Languages
Java
41%
Python
29%
TypeScript
28.1%
JavaScript
1.1%
Shell
0.2%
Other
0.2%
