datahub/README.md

# DataHub: A Generalized Metadata Search & Discovery Tool
[![Version](https://img.shields.io/github/v/release/linkedin/datahub?include_prereleases)]
[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)
(https://github.com/linkedin/datahub/releases)
[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/linkedin/datahub)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)

![DataHub](docs/imgs/datahub-logo.png)

> :sparkles: Feb 2020 Update: *DataHub v0.3.0* just [released](https://github.com/linkedin/datahub/releases/tag/datahub-v0.3.0)!

## Introduction
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our 
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). 
You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and 
[DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
This repository contains the complete source code to be able to build DataHub's frontend & backend services.

## Quickstart
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.
2. Open Docker either from the command line or the Desktop app and ensure it is up and running.
3. Clone this repo and `cd` into the root directory for the cloned repository.
4. Run below command to download and run all Docker containers in your local:
    ```
    cd docker/quickstart && docker-compose pull && docker-compose up --build
    ```
    This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify if DataHub is up and running.
5. At this point, you should be able to start `DataHub` by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, there is no data just yet.
6. To ingest [provided](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) sample data to DataHub, switch to a new terminal, `cd` into the cloned `datahub` repo, and run below command:
    ```
    docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
    ```
    After running this, you should be able to see sample data in DataHub.

Refer to [debugging guide](docs/debugging.md) if you have issues in any of the above steps.

## Quicklinks
* [DataHub Architecture](docs/architecture/architecture.md)
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
* [Docker Images](docker)
* [Frontend App](datahub-frontend)
* [Web Client App](datahub-web)
* [Generalized Metadata Service](gms)
* [Metadata Consumer Jobs](metadata-jobs)
* [Metadata Ingestion](metadata-ingestion)

## Releases
See [Releases](https://github.com/linkedin/datahub/releases) page for more details.

## Contributing
We welcome contributions from the community. Please refer to [the guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubation. 

## Roadmap
Check DataHub's [roadmap](docs/roadmap.md).
Update README.md 2020-02-07 07:33:52 -08:00			`# DataHub: A Generalized Metadata Search & Discovery Tool`
Update README.md 2020-02-13 05:22:20 -08:00			`[![Version](https://img.shields.io/github/v/release/linkedin/datahub?include_prereleases)]`
Fix Travis link in README 2020-01-23 12:04:27 -08:00			`[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)`
Update README.md 2020-02-13 05:22:20 -08:00			`(https://github.com/linkedin/datahub/releases)`
Update README.md 2020-02-08 06:24:04 -08:00			`[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)`
Add docker build badge to main README for GMS docker image build 2019-09-02 18:56:00 -07:00			`[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/linkedin/datahub)`
Update main README 2020-01-30 14:02:32 -08:00			`[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)`
Update main README 2019-09-01 16:03:45 -07:00
Documentation update part-1 2019-12-18 18:57:18 -08:00			`![DataHub](docs/imgs/datahub-logo.png)`
Initial commit 2015-11-19 14:39:21 -08:00
DataHub v0.3.0 announcement 2020-02-12 12:29:41 -08:00			`> :sparkles: Feb 2020 Update: DataHub v0.3.0 just [released](https://github.com/linkedin/datahub/releases/tag/datahub-v0.3.0)!`

Add documentation 2019-09-08 20:25:58 -07:00			`## Introduction`
Documentation update part-3 2019-12-20 02:36:24 -08:00			`DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our`
			`[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019).`
			`You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and`
			`[DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.`
			`This repository contains the complete source code to be able to build DataHub's frontend & backend services.`
Update README.md 2016-02-09 12:23:00 -08:00
Initial commit for Data Hub 2019-08-31 20:51:14 -07:00			`## Quickstart`
Updating readme with additional instructions 2020-02-06 16:21:30 -08:00			`1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.`
Update quickstart instructions 2020-02-10 15:36:46 -08:00			`2. Open Docker either from the command line or the Desktop app and ensure it is up and running.`
			3. Clone this repo and `cd` into the root directory for the cloned repository.
Update README.md Adding step to Readme 2020-02-06 10:40:15 -08:00			`4. Run below command to download and run all Docker containers in your local:`
Fix more doc 2020-02-06 16:33:02 -08:00			```
			`cd docker/quickstart && docker-compose pull && docker-compose up --build`
			```
Update quickstart instructions 2020-02-10 15:36:46 -08:00			`This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify if DataHub is up and running.`
Updating readme with additional instructions 2020-02-06 16:21:30 -08:00			5. At this point, you should be able to start `DataHub` by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, there is no data just yet.
			6. To ingest [provided](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) sample data to DataHub, switch to a new terminal, `cd` into the cloned `datahub` repo, and run below command:
Fix more doc 2020-02-06 16:33:02 -08:00			```
			`docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up`
			```
			`After running this, you should be able to see sample data in DataHub.`
Add documentation 2019-09-08 20:25:58 -07:00
Update docs 2020-01-24 17:48:25 -08:00			`Refer to [debugging guide](docs/debugging.md) if you have issues in any of the above steps.`

Add documentation 2019-09-08 20:25:58 -07:00			`## Quicklinks`
Documentation update part-3 2019-12-20 02:36:24 -08:00			`* [DataHub Architecture](docs/architecture/architecture.md)`
			`* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)`
Add documentation 2019-09-08 20:25:58 -07:00			`* [Docker Images](docker)`
			`* [Frontend App](datahub-frontend)`
Update README.md 2020-02-07 10:10:17 -08:00			`* [Web Client App](datahub-web)`
Documentation update part-3 2019-12-20 02:36:24 -08:00			`* [Generalized Metadata Service](gms)`
Add documentation 2019-09-08 20:25:58 -07:00			`* [Metadata Consumer Jobs](metadata-jobs)`
			`* [Metadata Ingestion](metadata-ingestion)`

Update README 2020-01-22 18:30:32 -08:00			`## Releases`
Update README.md 2020-02-04 18:36:08 -08:00			`See [Releases](https://github.com/linkedin/datahub/releases) page for more details.`
Update README 2020-01-22 18:30:32 -08:00
Update README.md 2020-02-10 05:04:12 -08:00			`## Contributing`
			`We welcome contributions from the community. Please refer to [the guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubation.`

Add documentation 2019-09-08 20:25:58 -07:00			`## Roadmap`
Add link to roadmap doc 2020-02-11 12:27:51 -08:00			`Check DataHub's [roadmap](docs/roadmap.md).`