Revert "build: build GitHub Page from /docs directory (#1750)" (#1751)

This reverts commit b0f56de7a81b8bf921ff37cb81024692d1b9a8ce.
This commit is contained in:
Mars Lan 2020-07-26 10:24:41 -07:00 committed by GitHub
parent b0f56de7a8
commit 3e4c110723
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 292 additions and 289 deletions

View File

@ -1 +0,0 @@
docs/CODE_OF_CONDUCT.md

77
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,77 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by direct messaging the project team on [Slack]. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[Slack]: https://datahubspace.slack.com/join/shared_invite/zt-cl60ng6o-6odCh_I~ejZKE~a9GG30PA
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

View File

@ -1 +0,0 @@
docs/CONTRIBUTING.md

84
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,84 @@
# Contributing
We always welcome contributions to help make DataHub better. Take a moment to read this document if you would like to contribute.
## Reporting issues
We use GitHub issues to track bug reports, feature requests, and submitting pull requests.
If you find a bug:
1. Use the GitHub issue search to check whether the bug has already been reported.
1. If the issue has been fixed, try to reproduce the issue using the latest master branch of the repository.
1. If the issue still reproduces or has not yet been reported, try to isolate the problem before opening an issue.
## Submitting a Pull Request (PR)
Before you submit your Pull Request (PR), consider the following guidelines:
* Search GitHub for an open or closed PR that relates to your submission. You don't want to duplicate effort.
* Follow the [standard GitHub approach](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to create the PR. Please also follow our [commit message format](#commit-message-format).
* That's it! Thank you for your contribution!
## Commit Message Format
Please follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for the commit message format. In summary, each commit message consists of a *header*, a *body* and a *footer*, separated by a single blank line.
```
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
```
Any line of the commit message cannot be longer than 88 characters! This allows the message to be easier to read on GitHub as well as in various Git tools.
### Type
Must be one of the following (based on the [Angular convention](https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines)):
* *feat*: A new feature
* *fix*: A bug fix
* *refactor*: A code change that neither fixes a bug nor adds a feature
* *docs*: Documentation only changes
* *test*: Adding missing tests or correcting existing tests
* *perf*: A code change that improves performance
* *style*: Changes that do not affect the meaning of the code (whitespace, formatting, missing semicolons, etc.)
* *build*: Changes that affect the build system or external dependencies
* *ci*: Changes to our CI configuration files and scripts
A scope may be provided to a commits type, to provide additional contextual information and is contained within parenthesis, e.g.,
```
feat(parser): add ability to parse arrays
```
### Description
Each commit must contain a succinct description of the change:
* use the imperative, present tense: "change" not "changed" nor "changes"
* don't capitalize the first letter
* no dot(.) at the end
### Body
Just as in the description, use the imperative, present tense: "change" not "changed" nor "changes". The body should include the motivation for the change and contrast this with previous behavior.
### Footer
The footer should contain any information about *Breaking Changes*, and is also the place to reference GitHub issues that this commit *Closes*.
*Breaking Changes* should start with the words `BREAKING CHANGE:` with a space or two new lines. The rest of the commit message is then used for this.
### Revert
If the commit reverts a previous commit, it should begin with `revert:`, followed by the description. In the body it should say: `Refs: <hash1> <hash2> ...`, where the hashs are the SHA of the commits being reverted, e.g.
```
revert: let us never again speak of the noodle incident
Refs: 676104e, a215868
```

View File

@ -1 +0,0 @@
docs/README.md

113
README.md Normal file
View File

@ -0,0 +1,113 @@
# DataHub: A Generalized Metadata Search & Discovery Tool
[![Version](https://img.shields.io/github/v/release/linkedin/datahub?include_prereleases)](https://github.com/linkedin/datahub/releases)
[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)
[![Get on Slack](https://img.shields.io/badge/slack-join-orange.svg)](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)
---
[Quickstart](#quickstart) |
[Documentation](#documentation) |
[Features](https://github.com/linkedin/datahub/blob/master/docs/features.md) |
[Roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md) |
[Adoption](#adoption) |
[FAQ](https://github.com/linkedin/datahub/blob/master/docs/faq.md) |
[Town Hall](https://github.com/linkedin/datahub/blob/master/docs/townhalls.md)
---
![DataHub](docs/imgs/datahub-logo.png)
> :mega: Next DataHub town hall meeting on July 31st, 9am-10am PDT:
> - [Signup sheet & questions](https://docs.google.com/spreadsheets/d/1hCTFQZnhYHAPa-DeIfyye4MlwmrY7GF4hBds5pTZJYM)
> - Details and recordings of past meetings can be found [here](docs/townhalls.md)
> :sparkles: Latest Update:
> - We've released v0.4.1. You can find release notes [here](https://github.com/linkedin/datahub/releases/tag/v0.4.1)
> - We're on Slack now! [Join](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA) or [log in with an existing account](https://datahubspace.slack.com). Ask questions and keep up with the latest announcements.
## Introduction
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and [DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
This repository contains the complete source code for both DataHub's frontend & backend. You can also read about [how we sync the changes](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) between our internal fork and GitHub.
## Quickstart
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) (if using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
2. Open Docker either from the command line or the desktop app and ensure it is up and running.
3. Clone this repo and `cd` into the root directory of the cloned repository.
4. Run the following command to download and run all Docker containers locally:
```
./docker/quickstart/quickstart.sh
```
This step takes a while to run the first time, and it may be difficult to tell if DataHub is fully up and running from the combined log. Please use [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify that each container is running correctly.
5. At this point, you should be able to start DataHub by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, you'll notice that no data has been ingested yet.
6. To ingest provided [sample data](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) to DataHub, switch to a new terminal window, `cd` into the cloned `datahub` repo, and run the following command:
```
./docker/ingestion/ingestion.sh
```
After running this, you should be able to see and search sample datasets in DataHub.
Please refer to the [debugging guide](docs/debugging.md) if you encounter any issues during the quickstart.
## Documentation
* [DataHub Developer's Guide](docs/developers.md)
* [DataHub Architecture](docs/architecture/architecture.md)
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
* [Docker Images](docker)
* [Frontend](datahub-frontend)
* [Web App](datahub-web)
* [Generalized Metadata Service](gms)
* [Metadata Ingestion](metadata-ingestion)
* [Metadata Processing Jobs](metadata-jobs)
## Releases
See [Releases](https://github.com/linkedin/datahub/releases) page for more details. We follow the [SemVer Specification](https://semver.org) when versioning the releases and adopt the [Keep a Changelog convention](https://keepachangelog.com/) for the changelog format.
## FAQs
Frequently Asked Questions about DataHub can be found [here](https://github.com/linkedin/datahub/blob/master/docs/faq.md).
## Features & Roadmap
Check out DataHub's [Features](docs/features.md) & [Roadmap](docs/roadmap.md).
## Contributing
We welcome contributions from the community. Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features.
## Community
Join our [slack workspace](https://app.slack.com/client/TUMKD5EGJ/DV0SB2ZQV/thread/GV2TEEZ5L-1583704023.001100) for discussions and important announcements. You can also find out more about our past and upcoming [town hall meetings](https://github.com/linkedin/datahub/blob/master/docs/townhalls.md).
## Adoption
Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it.
* [Expedia Group](http://expedia.com)
* [LinkedIn](http://linkedin.com)
* [Saxo Bank](https://www.home.saxo)
* [Shanghai HuaRui Bank](https://www.shrbank.com)
* [TypeForm](http://typeform.com)
* [Valassis]( https://www.valassis.com)
Here is a list of companies currently building POC or seriously evaluating DataHub.
* [Booking.com](https://www.booking.com)
* [Experian](https://www.experian.com)
* [Geotab](https://www.geotab.com)
* [Instructure](https://www.instructure.com)
* [Microsoft](https://microsoft.com)
* [Morgan Stanley](https://www.morganstanley.com)
* [Orange Telecom](https://www.orange.com)
* [SpotHero](https://spothero.com)
* [Sysco AS](https://sysco.no)
* [ThoughtWorks](https://www.thoughtworks.com)
* [University of Phoenix](https://www.phoenix.edu)
* [Vectice](https://www.vectice.com)
## Select Articles & Talks
* [DataHub: A Generalized Metadata Search & Discovery Tool](https://engineering.linkedin.com/blog/2019/data-hub)
* [Open sourcing DataHub: LinkedIns metadata search and discovery platform](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p)
* [The evolution of metadata: LinkedIns story @ Strata Data Conference 2019](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019)
* [Journey of metadata at LinkedIn @ Crunch Data Conference 2019](https://www.youtube.com/watch?v=OB-O0Y6OYDE)
* [DataHub Journey with Expedia Group by Arun Vasudevan](https://www.youtube.com/watch?v=ajcRdB22s5o)
* [Data Catalogue — Knowing your data](https://medium.com/albert-franzi/data-catalogue-knowing-your-data-15f7d0724900)
* [LinkedIn DataHub Application Architecture Quick Understanding](https://medium.com/@liangjunjiang/linkedin-datahub-application-architecture-quick-understanding-a5b7868ee205)
* [25 Hot New Data Tools and What They DONT Do](https://blog.amplifypartners.com/25-hot-new-data-tools-and-what-they-dont-do/)
See the full list [here](https://github.com/linkedin/datahub/blob/mars-lan-patch-2/docs/links.md).

View File

@ -1,13 +1,19 @@
safe: true
plugins:
- jekyll-relative-links
relative_links:
enabled: true
collections: true
include:
- CODE_OF_CONDUCT.md
- CONTRIBUTING.md
- README.md
- LICENSE.md
- COPYING.md
- CODE_OF_CONDUCT.md
- CONTRIBUTING.md
- ISSUE_TEMPLATE.md
- PULL_REQUEST_TEMPLATE.md
exclude:
- contrib
theme: jekyll-theme-cayman
title: DataHub
description: A Generalized Metadata Search & Discovery Tool

View File

@ -1,77 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by direct messaging the project team on [Slack]. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
[Slack]: https://datahubspace.slack.com/join/shared_invite/zt-cl60ng6o-6odCh_I~ejZKE~a9GG30PA
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

View File

@ -1,84 +0,0 @@
# Contributing
We always welcome contributions to help make DataHub better. Take a moment to read this document if you would like to contribute.
## Reporting issues
We use GitHub issues to track bug reports, feature requests, and submitting pull requests.
If you find a bug:
1. Use the GitHub issue search to check whether the bug has already been reported.
1. If the issue has been fixed, try to reproduce the issue using the latest master branch of the repository.
1. If the issue still reproduces or has not yet been reported, try to isolate the problem before opening an issue.
## Submitting a Pull Request (PR)
Before you submit your Pull Request (PR), consider the following guidelines:
* Search GitHub for an open or closed PR that relates to your submission. You don't want to duplicate effort.
* Follow the [standard GitHub approach](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to create the PR. Please also follow our [commit message format](#commit-message-format).
* That's it! Thank you for your contribution!
## Commit Message Format
Please follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for the commit message format. In summary, each commit message consists of a *header*, a *body* and a *footer*, separated by a single blank line.
```
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
```
Any line of the commit message cannot be longer than 88 characters! This allows the message to be easier to read on GitHub as well as in various Git tools.
### Type
Must be one of the following (based on the [Angular convention](https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines)):
* *feat*: A new feature
* *fix*: A bug fix
* *refactor*: A code change that neither fixes a bug nor adds a feature
* *docs*: Documentation only changes
* *test*: Adding missing tests or correcting existing tests
* *perf*: A code change that improves performance
* *style*: Changes that do not affect the meaning of the code (whitespace, formatting, missing semicolons, etc.)
* *build*: Changes that affect the build system or external dependencies
* *ci*: Changes to our CI configuration files and scripts
A scope may be provided to a commits type, to provide additional contextual information and is contained within parenthesis, e.g.,
```
feat(parser): add ability to parse arrays
```
### Description
Each commit must contain a succinct description of the change:
* use the imperative, present tense: "change" not "changed" nor "changes"
* don't capitalize the first letter
* no dot(.) at the end
### Body
Just as in the description, use the imperative, present tense: "change" not "changed" nor "changes". The body should include the motivation for the change and contrast this with previous behavior.
### Footer
The footer should contain any information about *Breaking Changes*, and is also the place to reference GitHub issues that this commit *Closes*.
*Breaking Changes* should start with the words `BREAKING CHANGE:` with a space or two new lines. The rest of the commit message is then used for this.
### Revert
If the commit reverts a previous commit, it should begin with `revert:`, followed by the description. In the body it should say: `Refs: <hash1> <hash2> ...`, where the hashs are the SHA of the commits being reverted, e.g.
```
revert: let us never again speak of the noodle incident
Refs: 676104e, a215868
```

View File

@ -1,113 +0,0 @@
# DataHub: A Generalized Metadata Search & Discovery Tool
[![Version](https://img.shields.io/github/v/release/linkedin/datahub?include_prereleases)](https://github.com/linkedin/datahub/releases)
[![Build Status](https://travis-ci.org/linkedin/datahub.svg)](https://travis-ci.org/linkedin/datahub)
[![Get on Slack](https://img.shields.io/badge/slack-join-orange.svg)](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
[![License](https://img.shields.io/github/license/linkedin/datahub)](LICENSE)
---
[Quickstart](#quickstart) |
[Documentation](#documentation) |
[Features](features.md) |
[Roadmap](roadmap.md) |
[Adoption](#adoption) |
[FAQ](faq.md) |
[Town Hall](townhalls.md)
---
![DataHub](imgs/datahub-logo.png)
> :mega: Next DataHub town hall meeting on July 31st, 9am-10am PDT:
> - [Signup sheet & questions](https://docs.google.com/spreadsheets/d/1hCTFQZnhYHAPa-DeIfyye4MlwmrY7GF4hBds5pTZJYM)
> - Details and recordings of past meetings can be found [here](docs/townhalls.md)
> :sparkles: Latest Update:
> - We've released v0.4.1. You can find release notes [here](https://github.com/linkedin/datahub/releases/tag/v0.4.1)
> - We're on Slack now! [Join](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA) or [log in with an existing account](https://datahubspace.slack.com). Ask questions and keep up with the latest announcements.
## Introduction
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and [DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
This repository contains the complete source code for both DataHub's frontend & backend. You can also read about [how we sync the changes](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) between our internal fork and GitHub.
## Quickstart
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) (if using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
2. Open Docker either from the command line or the desktop app and ensure it is up and running.
3. Clone this repo and `cd` into the root directory of the cloned repository.
4. Run the following command to download and run all Docker containers locally:
```
./docker/quickstart/quickstart.sh
```
This step takes a while to run the first time, and it may be difficult to tell if DataHub is fully up and running from the combined log. Please use [this guide](debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify that each container is running correctly.
5. At this point, you should be able to start DataHub by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, you'll notice that no data has been ingested yet.
6. To ingest provided [sample data](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) to DataHub, switch to a new terminal window, `cd` into the cloned `datahub` repo, and run the following command:
```
./docker/ingestion/ingestion.sh
```
After running this, you should be able to see and search sample datasets in DataHub.
Please refer to the [debugging guide](docs/debugging.md) if you encounter any issues during the quickstart.
## Documentation
* [DataHub Developer's Guide](docs/developers.md)
* [DataHub Architecture](docs/architecture/architecture.md)
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
* [Docker Images](docker)
* [Frontend](datahub-frontend)
* [Web App](datahub-web)
* [Generalized Metadata Service](gms)
* [Metadata Ingestion](metadata-ingestion)
* [Metadata Processing Jobs](metadata-jobs)
## Releases
See [Releases](https://github.com/linkedin/datahub/releases) page for more details. We follow the [SemVer Specification](https://semver.org) when versioning the releases and adopt the [Keep a Changelog convention](https://keepachangelog.com/) for the changelog format.
## FAQs
Frequently Asked Questions about DataHub can be found [here](faq.md).
## Features & Roadmap
Check out DataHub's [Features](docs/features.md) & [Roadmap](docs/roadmap.md).
## Contributing
We welcome contributions from the community. Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features.
## Community
Join our [slack workspace](https://app.slack.com/client/TUMKD5EGJ/DV0SB2ZQV/thread/GV2TEEZ5L-1583704023.001100) for discussions and important announcements. You can also find out more about our past and upcoming [town hall meetings](townhalls.md).
## Adoption
Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it.
* [Expedia Group](http://expedia.com)
* [LinkedIn](http://linkedin.com)
* [Saxo Bank](https://www.home.saxo)
* [Shanghai HuaRui Bank](https://www.shrbank.com)
* [TypeForm](http://typeform.com)
* [Valassis]( https://www.valassis.com)
Here is a list of companies currently building POC or seriously evaluating DataHub.
* [Booking.com](https://www.booking.com)
* [Experian](https://www.experian.com)
* [Geotab](https://www.geotab.com)
* [Instructure](https://www.instructure.com)
* [Microsoft](https://microsoft.com)
* [Morgan Stanley](https://www.morganstanley.com)
* [Orange Telecom](https://www.orange.com)
* [SpotHero](https://spothero.com)
* [Sysco AS](https://sysco.no)
* [ThoughtWorks](https://www.thoughtworks.com)
* [University of Phoenix](https://www.phoenix.edu)
* [Vectice](https://www.vectice.com)
## Select Articles & Talks
* [DataHub: A Generalized Metadata Search & Discovery Tool](https://engineering.linkedin.com/blog/2019/data-hub)
* [Open sourcing DataHub: LinkedIns metadata search and discovery platform](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p)
* [The evolution of metadata: LinkedIns story @ Strata Data Conference 2019](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019)
* [Journey of metadata at LinkedIn @ Crunch Data Conference 2019](https://www.youtube.com/watch?v=OB-O0Y6OYDE)
* [DataHub Journey with Expedia Group by Arun Vasudevan](https://www.youtube.com/watch?v=ajcRdB22s5o)
* [Data Catalogue — Knowing your data](https://medium.com/albert-franzi/data-catalogue-knowing-your-data-15f7d0724900)
* [LinkedIn DataHub Application Architecture Quick Understanding](https://medium.com/@liangjunjiang/linkedin-datahub-application-architecture-quick-understanding-a5b7868ee205)
* [25 Hot New Data Tools and What They DONT Do](https://blog.amplifypartners.com/25-hot-new-data-tools-and-what-they-dont-do/)
See the full list [here](links.md).

View File

@ -1,11 +1,11 @@
# Onboarding to GMA Graph - Adding a new relationship type
Steps for this already detailed in [How to onboard to GMA graph?](../how/graph-onboarding.md)
Steps for this already detailed in https://github.com/linkedin/datahub/blob/master/docs/how/graph-onboarding.md
For this exercise, we'll add a new relationship type `FollowedBy` which is extracted out of `Follow` aspect. For that, we first need to add `Follow` aspect.
## 1. Onboard `Follow` aspect
Referring to [How to add a new metadata aspect?](../how/add-new-aspect.md)
Referring to the guide https://github.com/linkedin/datahub/blob/master/docs/how/add-new-aspect.md
### 1.1 Model new aspect
* Follow.pdl

View File

@ -52,10 +52,10 @@ For reproducible technical issues, bugs and code contributions, Github [issues](
The [DataHub Introduction](https://engineering.linkedin.com/blog/2019/data-hub) and [Open Sourcing Datahub](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) blog posts are also useful resources for getting a high level understanding of the system.
## Where can I learn about the roadmap?
You can learn more about DataHub's [product roadmap](roadmap.md), which gets updated regularly.
You can learn more about DataHub's [product roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md), which gets updated regularly.
## Where can I learn about the current list of features/functionalities?
You can learn more about the current [list of features](features.md).
You can learn more about the current [list of features](https://github.com/linkedin/datahub/blob/master/docs/features.md).
## Are the product strategy/vision/roadmap driven by the LinkedIn Engineering team, community, or a collaborative effort?
Mixed of both LinkedIn DataHub team and the community. The roadmap will be a joint effort of both LinkedIn and the community. However, well most likely prioritize tasks that align with the community's asks.
@ -64,7 +64,7 @@ Mixed of both LinkedIn DataHub team and the community. The roadmap will be a joi
LinkedIn is not using GCP so we cannot commit to building and testing that connectivity. However, well be happy to accept community contributions for GCP integration. Also, our Slack channel and regularly scheduled town hall meetings are a good opportunity to meet with people from different companies who have similar requirements and might be interested in collaborating on these features.
## How approachable would LinkedIn be to provide insights/support or collaborate on a functionality?
Please take a look at our [roadmap](roadmap.md) & [features](features.md) to get a sense of whats being open sourced in the near future. If theres something missing from the list, were open to discussion. In fact, the town hall would be the perfect venue for such discussions.
Please take a look at our [roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md) & [features](https://github.com/linkedin/datahub/blob/master/docs/features.md) to get a sense of whats being open sourced in the near future. If theres something missing from the list, were open to discussion. In fact, the town hall would be the perfect venue for such discussions.
## How do LinkedIn Engineering team and the community ensure the quality of the community code for DataHub?
All PRs are reviewed by the LinkedIn team. Any extension/contribution coming from the community which LinkedIn team doesnt have any expertise on will be placed into a incuation directory first (`/contrib`). Once its blessed and adopted by the community, well graduate it from incubation and move it into the main code base.
@ -105,7 +105,7 @@ The [SchemaField](https://github.com/linkedin/datahub/blob/master/metadata-model
MCE is the ideal way to push metadata from different security zones, assuming there is a common Kafka infrastructure that aggregates the events from various security zones.
## What all data stores does DataHub backend support presently?
Currently, DataHub supports all major database providers that are supported by Ebean as the document store i.e. Oracle, Postgres, MySQL, H2. We also support [Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store), which is LinkedIn's proprietary document store. Other than that, we support Elasticsearch and Neo4j for search and graph use cases, respectively. However, as data stores in the backend are all abstracted and accessed through DAOs, you should be able to easily support other data stores by plugging in your own DAO implementations. Please refer to [Metadata Serving](architecture/metadata-serving.md) for more details.
Currently, DataHub supports all major database providers that are supported by Ebean as the document store i.e. Oracle, Postgres, MySQL, H2. We also support [Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store), which is LinkedIn's proprietary document store. Other than that, we support Elasticsearch and Neo4j for search and graph use cases, respectively. However, as data stores in the backend are all abstracted and accessed through DAOs, you should be able to easily support other data stores by plugging in your own DAO implementations. Please refer to https://github.com/linkedin/datahub/blob/master/docs/architecture/metadata-serving.md for more details.
## For which stores, you have discovery services?
Supported data sources are listed [here](https://github.com/linkedin/datahub/tree/master/metadata-ingestion). To onboard your own data source which is not listed there, you can refer to the [onboarding guide](how/data-source-onboarding.md).

View File

@ -1,6 +1,6 @@
# How to add a new metadata aspect?
Adding a new metadata [aspect](../what/aspect.md) is one of the most common ways to extend an existing [entity](../what/entity.md).
Adding a new metadata [aspect](https://github.com/linkedin/datahub/blob/master/docs/what/aspect.md) is one of the most common ways to extend an existing [entity](https://github.com/linkedin/datahub/blob/master/docs/what/entity.md).
We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdl) as an example here.
1. Add the aspect model to the corresponding namespace (e.g. [`com.linkedin.identity`](https://github.com/linkedin/datahub/tree/master/metadata-models/src/main/pegasus/com/linkedin/identity))
@ -17,4 +17,4 @@ We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/ma
5. (Optional) If there's need to update the aspect via API (instead of/in addition to MCE), add a [sub-resource](https://linkedin.github.io/rest.li/user_guide/restli_server#sub-resources) endpoint for the new aspect (e.g. [`CorpUsersEditableInfoResource`](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsersEditableInfoResource.java)). The sub-resource endpiont also allows you to retrieve previous versions of the aspect as well as additional metadata such as the audit stamp.
6. After rebuilding & restarting [gms](https://github.com/linkedin/datahub/tree/master/gms), [mce-consumer-job](https://github.com/linkedin/datahub/tree/master/metadata-jobs/mce-consumer-job) & [mae-consumer-job](https://github.com/linkedin/datahub/tree/master/metadata-jobs/mae-consumer-job),
you should be able to start emitting [MCE](../what/mxe.md) with the new aspect and have it automatically ingested & stored in DB.
you should be able to start emitting [MCE](https://github.com/linkedin/datahub/blob/master/docs/what/mxe.md) with the new aspect and have it automatically ingested & stored in DB.

View File

@ -1,6 +1,6 @@
# How to onboard a new data source?
In the [metadata-ingestion](https://github.com/linkedin/datahub/tree/master/metadata-ingestion), DataHub provides various kinds of metadata sources onboarding, including [Hive](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/hive-etl), [Kafka](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/kafka-etl), [LDAP](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/ldap-etl), [mySQL](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/mysql-etl), and generic [RDBMS](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/rdbms-etl) as ETL scripts to feed the metadata to the [GMS](../what/gms.md).
In the [metadata-ingestion](https://github.com/linkedin/datahub/tree/master/metadata-ingestion), DataHub provides various kinds of metadata sources onboarding, including [Hive](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/hive-etl), [Kafka](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/kafka-etl), [LDAP](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/ldap-etl), [mySQL](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/mysql-etl), and generic [RDBMS](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/rdbms-etl) as ETL scripts to feed the metadata to the [GMS](https://github.com/linkedin/datahub/blob/master/docs/what/gms.md).
## 1. Extract
The extract process will be specific tight to the data source, hence, the [data accessor](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/ldap-etl/ldap_etl.py#L103) should be able to reflect the correctness of the metadata from underlying data platforms.