datahub/README.md

50 lines
3.4 KiB
Markdown
Raw Normal View History

2020-02-07 07:33:52 -08:00
# DataHub: A Generalized Metadata Search & Discovery Tool
2020-01-23 12:04:27 -08:00
[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)
2020-02-08 06:24:04 -08:00
[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/linkedin/datahub)
2020-01-30 14:02:32 -08:00
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
2019-09-01 16:03:45 -07:00
2019-12-18 18:57:18 -08:00
![DataHub](docs/imgs/datahub-logo.png)
2015-11-19 14:39:21 -08:00
2019-09-08 20:25:58 -07:00
## Introduction
2019-12-20 02:36:24 -08:00
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019).
You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and
[DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
This repository contains the complete source code to be able to build DataHub's frontend & backend services.
2016-02-09 12:23:00 -08:00
2019-08-31 20:51:14 -07:00
## Quickstart
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.
2020-01-24 17:48:25 -08:00
2. Clone this repo.
3. Open Docker either from the command line or the Desktop app and ensure it is up and running then `cd` into the cloned `datahub` repo.
2020-02-06 10:40:15 -08:00
4. Run below command to download and run all Docker containers in your local:
2020-02-06 16:33:02 -08:00
```
cd docker/quickstart && docker-compose pull && docker-compose up --build
```
This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to confirm DataHub is up and running.
5. At this point, you should be able to start `DataHub` by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, there is no data just yet.
6. To ingest [provided](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) sample data to DataHub, switch to a new terminal, `cd` into the cloned `datahub` repo, and run below command:
2020-02-06 16:33:02 -08:00
```
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
```
After running this, you should be able to see sample data in DataHub.
2019-09-08 20:25:58 -07:00
2020-01-24 17:48:25 -08:00
Refer to [debugging guide](docs/debugging.md) if you have issues in any of the above steps.
2019-09-08 20:25:58 -07:00
## Quicklinks
2019-12-20 02:36:24 -08:00
* [DataHub Architecture](docs/architecture/architecture.md)
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
2019-09-08 20:25:58 -07:00
* [Docker Images](docker)
* [Frontend App](datahub-frontend)
2020-02-07 10:10:17 -08:00
* [Web Client App](datahub-web)
2019-12-20 02:36:24 -08:00
* [Generalized Metadata Service](gms)
2019-09-08 20:25:58 -07:00
* [Metadata Consumer Jobs](metadata-jobs)
* [Metadata Ingestion](metadata-ingestion)
2020-01-22 18:30:32 -08:00
## Releases
2020-02-04 18:36:08 -08:00
See [Releases](https://github.com/linkedin/datahub/releases) page for more details.
2020-01-22 18:30:32 -08:00
2019-09-08 20:25:58 -07:00
## Roadmap
2020-01-30 14:02:32 -08:00
1. [Kubernetes](https://kubernetes.io/) for container orchestration
2019-12-18 18:57:18 -08:00
2. Deploy DataHub to [Azure Cloud](https://azure.microsoft.com/en-us/)