datahub/README.md

56 lines
3.8 KiB
Markdown
Raw Normal View History

2020-02-07 07:33:52 -08:00
# DataHub: A Generalized Metadata Search & Discovery Tool
2020-02-13 05:22:20 -08:00
[![Version](https://img.shields.io/github/v/release/linkedin/datahub?include_prereleases)]
2020-01-23 12:04:27 -08:00
[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)
2020-02-13 05:22:20 -08:00
(https://github.com/linkedin/datahub/releases)
2020-02-08 06:24:04 -08:00
[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/linkedin/datahub)
2020-01-30 14:02:32 -08:00
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
2019-09-01 16:03:45 -07:00
2019-12-18 18:57:18 -08:00
![DataHub](docs/imgs/datahub-logo.png)
2015-11-19 14:39:21 -08:00
2020-02-12 12:29:41 -08:00
> :sparkles: Feb 2020 Update: *DataHub v0.3.0* just [released](https://github.com/linkedin/datahub/releases/tag/datahub-v0.3.0)!
2019-09-08 20:25:58 -07:00
## Introduction
2019-12-20 02:36:24 -08:00
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019).
You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and
[DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
This repository contains the complete source code to be able to build DataHub's frontend & backend services.
2016-02-09 12:23:00 -08:00
2019-08-31 20:51:14 -07:00
## Quickstart
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/). Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.
2020-02-10 15:36:46 -08:00
2. Open Docker either from the command line or the Desktop app and ensure it is up and running.
3. Clone this repo and `cd` into the root directory for the cloned repository.
2020-02-06 10:40:15 -08:00
4. Run below command to download and run all Docker containers in your local:
2020-02-06 16:33:02 -08:00
```
cd docker/quickstart && docker-compose pull && docker-compose up --build
```
2020-02-10 15:36:46 -08:00
This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify if DataHub is up and running.
5. At this point, you should be able to start `DataHub` by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, there is no data just yet.
6. To ingest [provided](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) sample data to DataHub, switch to a new terminal, `cd` into the cloned `datahub` repo, and run below command:
2020-02-06 16:33:02 -08:00
```
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
```
After running this, you should be able to see sample data in DataHub.
2019-09-08 20:25:58 -07:00
2020-01-24 17:48:25 -08:00
Refer to [debugging guide](docs/debugging.md) if you have issues in any of the above steps.
2019-09-08 20:25:58 -07:00
## Quicklinks
2019-12-20 02:36:24 -08:00
* [DataHub Architecture](docs/architecture/architecture.md)
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
2019-09-08 20:25:58 -07:00
* [Docker Images](docker)
* [Frontend App](datahub-frontend)
2020-02-07 10:10:17 -08:00
* [Web Client App](datahub-web)
2019-12-20 02:36:24 -08:00
* [Generalized Metadata Service](gms)
2019-09-08 20:25:58 -07:00
* [Metadata Consumer Jobs](metadata-jobs)
* [Metadata Ingestion](metadata-ingestion)
2020-01-22 18:30:32 -08:00
## Releases
2020-02-04 18:36:08 -08:00
See [Releases](https://github.com/linkedin/datahub/releases) page for more details.
2020-01-22 18:30:32 -08:00
2020-02-10 05:04:12 -08:00
## Contributing
We welcome contributions from the community. Please refer to [the guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubation.
2019-09-08 20:25:58 -07:00
## Roadmap
2020-02-11 12:27:51 -08:00
Check DataHub's [roadmap](docs/roadmap.md).