yujunjun/datahub

mirror of https://github.com/datahub-project/datahub.git synced 2025-10-24 23:48:23 +00:00

Go to file

Mars Lan 2ded8d8434

Update README.md

2020-02-10 05:03:52 -08:00

.github/ISSUE_TEMPLATE

Add issue templates

2020-01-15 10:32:50 -08:00

Update README.md

2020-02-10 05:03:52 -08:00

Add URNs for results to search response

2020-01-29 10:36:04 -08:00

datahub-frontend

Add forward slash escape for Elasticsearch queries

2020-02-05 19:05:49 -08:00

Update README.md

2020-02-07 10:08:57 -08:00

Change mysql data source URL to suppress server identity verification warning

2020-02-06 17:36:56 -08:00

Small doc fix

2020-02-06 18:28:29 -08:00

Add some debuggin help & update default profile image link

2020-01-27 15:59:34 -08:00

Initial commit for Data Hub

2019-08-31 20:51:14 -07:00

Remove dataset groups entity

2019-12-13 15:12:50 -08:00

metadata-builders

Remove dataset groups entity

2019-12-13 15:12:50 -08:00

metadata-models 50.0.6 -> 54.0.1:

2019-12-13 11:46:49 -08:00

metadata-dao-impl

Remove dataset groups entity

2019-12-13 15:12:50 -08:00

metadata-events

Removing unnecessary classes for mxe-registration

2019-12-04 17:53:19 -08:00

metadata-ingestion

Update bootstrap data

2020-02-07 18:11:10 -08:00

Rename elasticsearch-index to mae-consumer in MaeStreamTask

2019-12-19 17:46:19 -08:00

metadata-models

Add some debuggin help & update default profile image link

2020-01-27 15:59:34 -08:00

metadata-restli-resource

metadata-models 50.0.6 -> 54.0.1:

2019-12-13 11:46:49 -08:00

metadata-testing

Remove dataset groups entity

2019-12-13 15:12:50 -08:00

Enable datahub-mae-consumer job to build graph as well

2019-11-26 22:19:46 -08:00

metadata-validators

corp-identity-gms 1.0.26 -> 1.0.40:

2019-11-19 02:27:28 -08:00

.dockerignore

Add docker ignore file

2019-09-02 18:36:18 -07:00

.gitignore

Add missing MXE models and fix .gitignore

2019-09-01 15:23:39 -07:00

.travis.yml

Update travis configuration to optimize build time

2019-10-07 15:38:12 -07:00

build.gradle

Fix some changes which came with automatic commit

2019-11-19 03:08:00 -08:00

CONTRIBUTING.md

Fix doc

2020-01-24 17:38:14 -08:00

gradlew

Update gradle version to 4.0.2 (#627 )

2017-07-30 11:07:14 -07:00

gradlew.bat

Update gradle version to 4.0.2 (#627 )

2017-07-30 11:07:14 -07:00

LICENSE

Initial commit

2015-11-19 14:39:21 -08:00

README.md

Update README.md

2020-02-08 06:24:04 -08:00

repositories.gradle

Initial commit for Data Hub

2019-08-31 20:51:14 -07:00

settings.gradle

Rename elasticsearch-index-job to mae-consumer-job

2019-11-20 18:19:31 -08:00

README.md

DataHub: A Generalized Metadata Search & Discovery Tool

Introduction

DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our LinkedIn blog post and Strata presentation. You should also visit DataHub Architecture to get a better understanding of how DataHub is implemented and DataHub Onboarding Guide to understand how to extend DataHub for your own use case. This repository contains the complete source code to be able to build DataHub's frontend & backend services.

Quickstart

Install docker and docker-compose. Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.
Clone this repo.
Open Docker either from the command line or the Desktop app and ensure it is up and running then cd into the cloned datahub repo.
Run below command to download and run all Docker containers in your local:
```
cd docker/quickstart && docker-compose pull && docker-compose up --build
```
This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to this guide to confirm DataHub is up and running.
At this point, you should be able to start DataHub by opening http://localhost:9001 in your browser. You can sign in using datahub as both username and password. However, there is no data just yet.
To ingest provided sample data to DataHub, switch to a new terminal, cd into the cloned datahub repo, and run below command:
```
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
```
After running this, you should be able to see sample data in DataHub.

Refer to debugging guide if you have issues in any of the above steps.

Quicklinks

Releases

See Releases page for more details.

Roadmap

Kubernetes for container orchestration
Deploy DataHub to Azure Cloud

Languages

Java 40.9%

Python 29%

TypeScript 28.2%

JavaScript 1.1%

Shell 0.2%

Other 0.2%