mirror of
https://github.com/datahub-project/datahub.git
synced 2025-08-27 18:45:50 +00:00
This reverts commit b0f56de7a81b8bf921ff37cb81024692d1b9a8ce.
This commit is contained in:
parent
b0f56de7a8
commit
3e4c110723
@ -1 +0,0 @@
|
||||
docs/CODE_OF_CONDUCT.md
|
77
CODE_OF_CONDUCT.md
Normal file
77
CODE_OF_CONDUCT.md
Normal file
@ -0,0 +1,77 @@
|
||||
# Contributor Covenant Code of Conduct
|
||||
|
||||
## Our Pledge
|
||||
|
||||
In the interest of fostering an open and welcoming environment, we as
|
||||
contributors and maintainers pledge to making participation in our project and
|
||||
our community a harassment-free experience for everyone, regardless of age, body
|
||||
size, disability, ethnicity, sex characteristics, gender identity and expression,
|
||||
level of experience, education, socio-economic status, nationality, personal
|
||||
appearance, race, religion, or sexual identity and orientation.
|
||||
|
||||
## Our Standards
|
||||
|
||||
Examples of behavior that contributes to creating a positive environment
|
||||
include:
|
||||
|
||||
* Using welcoming and inclusive language
|
||||
* Being respectful of differing viewpoints and experiences
|
||||
* Gracefully accepting constructive criticism
|
||||
* Focusing on what is best for the community
|
||||
* Showing empathy towards other community members
|
||||
|
||||
Examples of unacceptable behavior by participants include:
|
||||
|
||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
||||
advances
|
||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
||||
* Public or private harassment
|
||||
* Publishing others' private information, such as a physical or electronic
|
||||
address, without explicit permission
|
||||
* Other conduct which could reasonably be considered inappropriate in a
|
||||
professional setting
|
||||
|
||||
## Our Responsibilities
|
||||
|
||||
Project maintainers are responsible for clarifying the standards of acceptable
|
||||
behavior and are expected to take appropriate and fair corrective action in
|
||||
response to any instances of unacceptable behavior.
|
||||
|
||||
Project maintainers have the right and responsibility to remove, edit, or
|
||||
reject comments, commits, code, wiki edits, issues, and other contributions
|
||||
that are not aligned to this Code of Conduct, or to ban temporarily or
|
||||
permanently any contributor for other behaviors that they deem inappropriate,
|
||||
threatening, offensive, or harmful.
|
||||
|
||||
## Scope
|
||||
|
||||
This Code of Conduct applies both within project spaces and in public spaces
|
||||
when an individual is representing the project or its community. Examples of
|
||||
representing a project or community include using an official project e-mail
|
||||
address, posting via an official social media account, or acting as an appointed
|
||||
representative at an online or offline event. Representation of a project may be
|
||||
further defined and clarified by project maintainers.
|
||||
|
||||
## Enforcement
|
||||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported by direct messaging the project team on [Slack]. All
|
||||
complaints will be reviewed and investigated and will result in a response that
|
||||
is deemed necessary and appropriate to the circumstances. The project team is
|
||||
obligated to maintain confidentiality with regard to the reporter of an incident.
|
||||
Further details of specific enforcement policies may be posted separately.
|
||||
|
||||
Project maintainers who do not follow or enforce the Code of Conduct in good
|
||||
faith may face temporary or permanent repercussions as determined by other
|
||||
members of the project's leadership.
|
||||
|
||||
## Attribution
|
||||
|
||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
||||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
||||
|
||||
[Slack]: https://datahubspace.slack.com/join/shared_invite/zt-cl60ng6o-6odCh_I~ejZKE~a9GG30PA
|
||||
[homepage]: https://www.contributor-covenant.org
|
||||
|
||||
For answers to common questions about this code of conduct, see
|
||||
https://www.contributor-covenant.org/faq
|
@ -1 +0,0 @@
|
||||
docs/CONTRIBUTING.md
|
84
CONTRIBUTING.md
Normal file
84
CONTRIBUTING.md
Normal file
@ -0,0 +1,84 @@
|
||||
# Contributing
|
||||
|
||||
We always welcome contributions to help make DataHub better. Take a moment to read this document if you would like to contribute.
|
||||
|
||||
## Reporting issues
|
||||
|
||||
We use GitHub issues to track bug reports, feature requests, and submitting pull requests.
|
||||
|
||||
If you find a bug:
|
||||
|
||||
1. Use the GitHub issue search to check whether the bug has already been reported.
|
||||
|
||||
1. If the issue has been fixed, try to reproduce the issue using the latest master branch of the repository.
|
||||
|
||||
1. If the issue still reproduces or has not yet been reported, try to isolate the problem before opening an issue.
|
||||
|
||||
## Submitting a Pull Request (PR)
|
||||
|
||||
Before you submit your Pull Request (PR), consider the following guidelines:
|
||||
|
||||
* Search GitHub for an open or closed PR that relates to your submission. You don't want to duplicate effort.
|
||||
* Follow the [standard GitHub approach](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to create the PR. Please also follow our [commit message format](#commit-message-format).
|
||||
* That's it! Thank you for your contribution!
|
||||
|
||||
## Commit Message Format
|
||||
|
||||
Please follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for the commit message format. In summary, each commit message consists of a *header*, a *body* and a *footer*, separated by a single blank line.
|
||||
|
||||
```
|
||||
<type>[optional scope]: <description>
|
||||
|
||||
[optional body]
|
||||
|
||||
[optional footer(s)]
|
||||
```
|
||||
|
||||
Any line of the commit message cannot be longer than 88 characters! This allows the message to be easier to read on GitHub as well as in various Git tools.
|
||||
|
||||
### Type
|
||||
|
||||
Must be one of the following (based on the [Angular convention](https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines)):
|
||||
|
||||
* *feat*: A new feature
|
||||
* *fix*: A bug fix
|
||||
* *refactor*: A code change that neither fixes a bug nor adds a feature
|
||||
* *docs*: Documentation only changes
|
||||
* *test*: Adding missing tests or correcting existing tests
|
||||
* *perf*: A code change that improves performance
|
||||
* *style*: Changes that do not affect the meaning of the code (whitespace, formatting, missing semicolons, etc.)
|
||||
* *build*: Changes that affect the build system or external dependencies
|
||||
* *ci*: Changes to our CI configuration files and scripts
|
||||
|
||||
A scope may be provided to a commit’s type, to provide additional contextual information and is contained within parenthesis, e.g.,
|
||||
```
|
||||
feat(parser): add ability to parse arrays
|
||||
```
|
||||
|
||||
### Description
|
||||
|
||||
Each commit must contain a succinct description of the change:
|
||||
|
||||
* use the imperative, present tense: "change" not "changed" nor "changes"
|
||||
* don't capitalize the first letter
|
||||
* no dot(.) at the end
|
||||
|
||||
### Body
|
||||
|
||||
Just as in the description, use the imperative, present tense: "change" not "changed" nor "changes". The body should include the motivation for the change and contrast this with previous behavior.
|
||||
|
||||
### Footer
|
||||
|
||||
The footer should contain any information about *Breaking Changes*, and is also the place to reference GitHub issues that this commit *Closes*.
|
||||
|
||||
*Breaking Changes* should start with the words `BREAKING CHANGE:` with a space or two new lines. The rest of the commit message is then used for this.
|
||||
|
||||
### Revert
|
||||
|
||||
If the commit reverts a previous commit, it should begin with `revert:`, followed by the description. In the body it should say: `Refs: <hash1> <hash2> ...`, where the hashs are the SHA of the commits being reverted, e.g.
|
||||
|
||||
```
|
||||
revert: let us never again speak of the noodle incident
|
||||
|
||||
Refs: 676104e, a215868
|
||||
```
|
113
README.md
Normal file
113
README.md
Normal file
@ -0,0 +1,113 @@
|
||||
# DataHub: A Generalized Metadata Search & Discovery Tool
|
||||
[](https://github.com/linkedin/datahub/releases)
|
||||
[](https://travis-ci.org/linkedin/datahub)
|
||||
[](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA)
|
||||
[](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
|
||||
[](LICENSE)
|
||||
|
||||
---
|
||||
|
||||
[Quickstart](#quickstart) |
|
||||
[Documentation](#documentation) |
|
||||
[Features](https://github.com/linkedin/datahub/blob/master/docs/features.md) |
|
||||
[Roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md) |
|
||||
[Adoption](#adoption) |
|
||||
[FAQ](https://github.com/linkedin/datahub/blob/master/docs/faq.md) |
|
||||
[Town Hall](https://github.com/linkedin/datahub/blob/master/docs/townhalls.md)
|
||||
|
||||
---
|
||||
|
||||

|
||||
|
||||
> :mega: Next DataHub town hall meeting on July 31st, 9am-10am PDT:
|
||||
> - [Signup sheet & questions](https://docs.google.com/spreadsheets/d/1hCTFQZnhYHAPa-DeIfyye4MlwmrY7GF4hBds5pTZJYM)
|
||||
> - Details and recordings of past meetings can be found [here](docs/townhalls.md)
|
||||
|
||||
> :sparkles: Latest Update:
|
||||
> - We've released v0.4.1. You can find release notes [here](https://github.com/linkedin/datahub/releases/tag/v0.4.1)
|
||||
> - We're on Slack now! [Join](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA) or [log in with an existing account](https://datahubspace.slack.com). Ask questions and keep up with the latest announcements.
|
||||
|
||||
## Introduction
|
||||
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
|
||||
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and [DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
|
||||
|
||||
This repository contains the complete source code for both DataHub's frontend & backend. You can also read about [how we sync the changes](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) between our internal fork and GitHub.
|
||||
|
||||
## Quickstart
|
||||
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) (if using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
|
||||
2. Open Docker either from the command line or the desktop app and ensure it is up and running.
|
||||
3. Clone this repo and `cd` into the root directory of the cloned repository.
|
||||
4. Run the following command to download and run all Docker containers locally:
|
||||
```
|
||||
./docker/quickstart/quickstart.sh
|
||||
```
|
||||
This step takes a while to run the first time, and it may be difficult to tell if DataHub is fully up and running from the combined log. Please use [this guide](https://github.com/linkedin/datahub/blob/master/docs/debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify that each container is running correctly.
|
||||
5. At this point, you should be able to start DataHub by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, you'll notice that no data has been ingested yet.
|
||||
6. To ingest provided [sample data](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) to DataHub, switch to a new terminal window, `cd` into the cloned `datahub` repo, and run the following command:
|
||||
```
|
||||
./docker/ingestion/ingestion.sh
|
||||
```
|
||||
After running this, you should be able to see and search sample datasets in DataHub.
|
||||
|
||||
Please refer to the [debugging guide](docs/debugging.md) if you encounter any issues during the quickstart.
|
||||
|
||||
## Documentation
|
||||
* [DataHub Developer's Guide](docs/developers.md)
|
||||
* [DataHub Architecture](docs/architecture/architecture.md)
|
||||
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
|
||||
* [Docker Images](docker)
|
||||
* [Frontend](datahub-frontend)
|
||||
* [Web App](datahub-web)
|
||||
* [Generalized Metadata Service](gms)
|
||||
* [Metadata Ingestion](metadata-ingestion)
|
||||
* [Metadata Processing Jobs](metadata-jobs)
|
||||
|
||||
## Releases
|
||||
See [Releases](https://github.com/linkedin/datahub/releases) page for more details. We follow the [SemVer Specification](https://semver.org) when versioning the releases and adopt the [Keep a Changelog convention](https://keepachangelog.com/) for the changelog format.
|
||||
|
||||
## FAQs
|
||||
Frequently Asked Questions about DataHub can be found [here](https://github.com/linkedin/datahub/blob/master/docs/faq.md).
|
||||
|
||||
## Features & Roadmap
|
||||
Check out DataHub's [Features](docs/features.md) & [Roadmap](docs/roadmap.md).
|
||||
|
||||
## Contributing
|
||||
We welcome contributions from the community. Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features.
|
||||
|
||||
## Community
|
||||
Join our [slack workspace](https://app.slack.com/client/TUMKD5EGJ/DV0SB2ZQV/thread/GV2TEEZ5L-1583704023.001100) for discussions and important announcements. You can also find out more about our past and upcoming [town hall meetings](https://github.com/linkedin/datahub/blob/master/docs/townhalls.md).
|
||||
|
||||
## Adoption
|
||||
Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it.
|
||||
* [Expedia Group](http://expedia.com)
|
||||
* [LinkedIn](http://linkedin.com)
|
||||
* [Saxo Bank](https://www.home.saxo)
|
||||
* [Shanghai HuaRui Bank](https://www.shrbank.com)
|
||||
* [TypeForm](http://typeform.com)
|
||||
* [Valassis]( https://www.valassis.com)
|
||||
|
||||
Here is a list of companies currently building POC or seriously evaluating DataHub.
|
||||
* [Booking.com](https://www.booking.com)
|
||||
* [Experian](https://www.experian.com)
|
||||
* [Geotab](https://www.geotab.com)
|
||||
* [Instructure](https://www.instructure.com)
|
||||
* [Microsoft](https://microsoft.com)
|
||||
* [Morgan Stanley](https://www.morganstanley.com)
|
||||
* [Orange Telecom](https://www.orange.com)
|
||||
* [SpotHero](https://spothero.com)
|
||||
* [Sysco AS](https://sysco.no)
|
||||
* [ThoughtWorks](https://www.thoughtworks.com)
|
||||
* [University of Phoenix](https://www.phoenix.edu)
|
||||
* [Vectice](https://www.vectice.com)
|
||||
|
||||
## Select Articles & Talks
|
||||
* [DataHub: A Generalized Metadata Search & Discovery Tool](https://engineering.linkedin.com/blog/2019/data-hub)
|
||||
* [Open sourcing DataHub: LinkedIn’s metadata search and discovery platform](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p)
|
||||
* [The evolution of metadata: LinkedIn’s story @ Strata Data Conference 2019](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019)
|
||||
* [Journey of metadata at LinkedIn @ Crunch Data Conference 2019](https://www.youtube.com/watch?v=OB-O0Y6OYDE)
|
||||
* [DataHub Journey with Expedia Group by Arun Vasudevan](https://www.youtube.com/watch?v=ajcRdB22s5o)
|
||||
* [Data Catalogue — Knowing your data](https://medium.com/albert-franzi/data-catalogue-knowing-your-data-15f7d0724900)
|
||||
* [LinkedIn DataHub Application Architecture Quick Understanding](https://medium.com/@liangjunjiang/linkedin-datahub-application-architecture-quick-understanding-a5b7868ee205)
|
||||
* [25 Hot New Data Tools and What They DON’T Do](https://blog.amplifypartners.com/25-hot-new-data-tools-and-what-they-dont-do/)
|
||||
|
||||
See the full list [here](https://github.com/linkedin/datahub/blob/mars-lan-patch-2/docs/links.md).
|
@ -1,13 +1,19 @@
|
||||
safe: true
|
||||
plugins:
|
||||
- jekyll-relative-links
|
||||
relative_links:
|
||||
enabled: true
|
||||
collections: true
|
||||
include:
|
||||
- CODE_OF_CONDUCT.md
|
||||
- CONTRIBUTING.md
|
||||
- README.md
|
||||
- LICENSE.md
|
||||
- COPYING.md
|
||||
- CODE_OF_CONDUCT.md
|
||||
- CONTRIBUTING.md
|
||||
- ISSUE_TEMPLATE.md
|
||||
- PULL_REQUEST_TEMPLATE.md
|
||||
exclude:
|
||||
- contrib
|
||||
|
||||
theme: jekyll-theme-cayman
|
||||
title: DataHub
|
||||
description: A Generalized Metadata Search & Discovery Tool
|
@ -1,77 +0,0 @@
|
||||
# Contributor Covenant Code of Conduct
|
||||
|
||||
## Our Pledge
|
||||
|
||||
In the interest of fostering an open and welcoming environment, we as
|
||||
contributors and maintainers pledge to making participation in our project and
|
||||
our community a harassment-free experience for everyone, regardless of age, body
|
||||
size, disability, ethnicity, sex characteristics, gender identity and expression,
|
||||
level of experience, education, socio-economic status, nationality, personal
|
||||
appearance, race, religion, or sexual identity and orientation.
|
||||
|
||||
## Our Standards
|
||||
|
||||
Examples of behavior that contributes to creating a positive environment
|
||||
include:
|
||||
|
||||
* Using welcoming and inclusive language
|
||||
* Being respectful of differing viewpoints and experiences
|
||||
* Gracefully accepting constructive criticism
|
||||
* Focusing on what is best for the community
|
||||
* Showing empathy towards other community members
|
||||
|
||||
Examples of unacceptable behavior by participants include:
|
||||
|
||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
||||
advances
|
||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
||||
* Public or private harassment
|
||||
* Publishing others' private information, such as a physical or electronic
|
||||
address, without explicit permission
|
||||
* Other conduct which could reasonably be considered inappropriate in a
|
||||
professional setting
|
||||
|
||||
## Our Responsibilities
|
||||
|
||||
Project maintainers are responsible for clarifying the standards of acceptable
|
||||
behavior and are expected to take appropriate and fair corrective action in
|
||||
response to any instances of unacceptable behavior.
|
||||
|
||||
Project maintainers have the right and responsibility to remove, edit, or
|
||||
reject comments, commits, code, wiki edits, issues, and other contributions
|
||||
that are not aligned to this Code of Conduct, or to ban temporarily or
|
||||
permanently any contributor for other behaviors that they deem inappropriate,
|
||||
threatening, offensive, or harmful.
|
||||
|
||||
## Scope
|
||||
|
||||
This Code of Conduct applies both within project spaces and in public spaces
|
||||
when an individual is representing the project or its community. Examples of
|
||||
representing a project or community include using an official project e-mail
|
||||
address, posting via an official social media account, or acting as an appointed
|
||||
representative at an online or offline event. Representation of a project may be
|
||||
further defined and clarified by project maintainers.
|
||||
|
||||
## Enforcement
|
||||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported by direct messaging the project team on [Slack]. All
|
||||
complaints will be reviewed and investigated and will result in a response that
|
||||
is deemed necessary and appropriate to the circumstances. The project team is
|
||||
obligated to maintain confidentiality with regard to the reporter of an incident.
|
||||
Further details of specific enforcement policies may be posted separately.
|
||||
|
||||
Project maintainers who do not follow or enforce the Code of Conduct in good
|
||||
faith may face temporary or permanent repercussions as determined by other
|
||||
members of the project's leadership.
|
||||
|
||||
## Attribution
|
||||
|
||||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
|
||||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
||||
|
||||
[Slack]: https://datahubspace.slack.com/join/shared_invite/zt-cl60ng6o-6odCh_I~ejZKE~a9GG30PA
|
||||
[homepage]: https://www.contributor-covenant.org
|
||||
|
||||
For answers to common questions about this code of conduct, see
|
||||
https://www.contributor-covenant.org/faq
|
@ -1,84 +0,0 @@
|
||||
# Contributing
|
||||
|
||||
We always welcome contributions to help make DataHub better. Take a moment to read this document if you would like to contribute.
|
||||
|
||||
## Reporting issues
|
||||
|
||||
We use GitHub issues to track bug reports, feature requests, and submitting pull requests.
|
||||
|
||||
If you find a bug:
|
||||
|
||||
1. Use the GitHub issue search to check whether the bug has already been reported.
|
||||
|
||||
1. If the issue has been fixed, try to reproduce the issue using the latest master branch of the repository.
|
||||
|
||||
1. If the issue still reproduces or has not yet been reported, try to isolate the problem before opening an issue.
|
||||
|
||||
## Submitting a Pull Request (PR)
|
||||
|
||||
Before you submit your Pull Request (PR), consider the following guidelines:
|
||||
|
||||
* Search GitHub for an open or closed PR that relates to your submission. You don't want to duplicate effort.
|
||||
* Follow the [standard GitHub approach](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to create the PR. Please also follow our [commit message format](#commit-message-format).
|
||||
* That's it! Thank you for your contribution!
|
||||
|
||||
## Commit Message Format
|
||||
|
||||
Please follow the [Conventional Commits](https://www.conventionalcommits.org/) specification for the commit message format. In summary, each commit message consists of a *header*, a *body* and a *footer*, separated by a single blank line.
|
||||
|
||||
```
|
||||
<type>[optional scope]: <description>
|
||||
|
||||
[optional body]
|
||||
|
||||
[optional footer(s)]
|
||||
```
|
||||
|
||||
Any line of the commit message cannot be longer than 88 characters! This allows the message to be easier to read on GitHub as well as in various Git tools.
|
||||
|
||||
### Type
|
||||
|
||||
Must be one of the following (based on the [Angular convention](https://github.com/angular/angular/blob/22b96b9/CONTRIBUTING.md#-commit-message-guidelines)):
|
||||
|
||||
* *feat*: A new feature
|
||||
* *fix*: A bug fix
|
||||
* *refactor*: A code change that neither fixes a bug nor adds a feature
|
||||
* *docs*: Documentation only changes
|
||||
* *test*: Adding missing tests or correcting existing tests
|
||||
* *perf*: A code change that improves performance
|
||||
* *style*: Changes that do not affect the meaning of the code (whitespace, formatting, missing semicolons, etc.)
|
||||
* *build*: Changes that affect the build system or external dependencies
|
||||
* *ci*: Changes to our CI configuration files and scripts
|
||||
|
||||
A scope may be provided to a commit’s type, to provide additional contextual information and is contained within parenthesis, e.g.,
|
||||
```
|
||||
feat(parser): add ability to parse arrays
|
||||
```
|
||||
|
||||
### Description
|
||||
|
||||
Each commit must contain a succinct description of the change:
|
||||
|
||||
* use the imperative, present tense: "change" not "changed" nor "changes"
|
||||
* don't capitalize the first letter
|
||||
* no dot(.) at the end
|
||||
|
||||
### Body
|
||||
|
||||
Just as in the description, use the imperative, present tense: "change" not "changed" nor "changes". The body should include the motivation for the change and contrast this with previous behavior.
|
||||
|
||||
### Footer
|
||||
|
||||
The footer should contain any information about *Breaking Changes*, and is also the place to reference GitHub issues that this commit *Closes*.
|
||||
|
||||
*Breaking Changes* should start with the words `BREAKING CHANGE:` with a space or two new lines. The rest of the commit message is then used for this.
|
||||
|
||||
### Revert
|
||||
|
||||
If the commit reverts a previous commit, it should begin with `revert:`, followed by the description. In the body it should say: `Refs: <hash1> <hash2> ...`, where the hashs are the SHA of the commits being reverted, e.g.
|
||||
|
||||
```
|
||||
revert: let us never again speak of the noodle incident
|
||||
|
||||
Refs: 676104e, a215868
|
||||
```
|
113
docs/README.md
113
docs/README.md
@ -1,113 +0,0 @@
|
||||
# DataHub: A Generalized Metadata Search & Discovery Tool
|
||||
[](https://github.com/linkedin/datahub/releases)
|
||||
[](https://travis-ci.org/linkedin/datahub)
|
||||
[](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA)
|
||||
[](https://github.com/linkedin/datahub/blob/master/CONTRIBUTING.md)
|
||||
[](LICENSE)
|
||||
|
||||
---
|
||||
|
||||
[Quickstart](#quickstart) |
|
||||
[Documentation](#documentation) |
|
||||
[Features](features.md) |
|
||||
[Roadmap](roadmap.md) |
|
||||
[Adoption](#adoption) |
|
||||
[FAQ](faq.md) |
|
||||
[Town Hall](townhalls.md)
|
||||
|
||||
---
|
||||
|
||||

|
||||
|
||||
> :mega: Next DataHub town hall meeting on July 31st, 9am-10am PDT:
|
||||
> - [Signup sheet & questions](https://docs.google.com/spreadsheets/d/1hCTFQZnhYHAPa-DeIfyye4MlwmrY7GF4hBds5pTZJYM)
|
||||
> - Details and recordings of past meetings can be found [here](docs/townhalls.md)
|
||||
|
||||
> :sparkles: Latest Update:
|
||||
> - We've released v0.4.1. You can find release notes [here](https://github.com/linkedin/datahub/releases/tag/v0.4.1)
|
||||
> - We're on Slack now! [Join](https://join.slack.com/t/datahubspace/shared_invite/zt-dkzbxfck-dzNl96vBzB06pJpbRwP6RA) or [log in with an existing account](https://datahubspace.slack.com). Ask questions and keep up with the latest announcements.
|
||||
|
||||
## Introduction
|
||||
DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
|
||||
[LinkedIn blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented and [DataHub Onboarding Guide](docs/how/entity-onboarding.md) to understand how to extend DataHub for your own use case.
|
||||
|
||||
This repository contains the complete source code for both DataHub's frontend & backend. You can also read about [how we sync the changes](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) between our internal fork and GitHub.
|
||||
|
||||
## Quickstart
|
||||
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) (if using Linux). Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area.
|
||||
2. Open Docker either from the command line or the desktop app and ensure it is up and running.
|
||||
3. Clone this repo and `cd` into the root directory of the cloned repository.
|
||||
4. Run the following command to download and run all Docker containers locally:
|
||||
```
|
||||
./docker/quickstart/quickstart.sh
|
||||
```
|
||||
This step takes a while to run the first time, and it may be difficult to tell if DataHub is fully up and running from the combined log. Please use [this guide](debugging.md#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart) to verify that each container is running correctly.
|
||||
5. At this point, you should be able to start DataHub by opening [http://localhost:9001](http://localhost:9001) in your browser. You can sign in using `datahub` as both username and password. However, you'll notice that no data has been ingested yet.
|
||||
6. To ingest provided [sample data](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/mce-cli/bootstrap_mce.dat) to DataHub, switch to a new terminal window, `cd` into the cloned `datahub` repo, and run the following command:
|
||||
```
|
||||
./docker/ingestion/ingestion.sh
|
||||
```
|
||||
After running this, you should be able to see and search sample datasets in DataHub.
|
||||
|
||||
Please refer to the [debugging guide](docs/debugging.md) if you encounter any issues during the quickstart.
|
||||
|
||||
## Documentation
|
||||
* [DataHub Developer's Guide](docs/developers.md)
|
||||
* [DataHub Architecture](docs/architecture/architecture.md)
|
||||
* [DataHub Onboarding Guide](docs/how/entity-onboarding.md)
|
||||
* [Docker Images](docker)
|
||||
* [Frontend](datahub-frontend)
|
||||
* [Web App](datahub-web)
|
||||
* [Generalized Metadata Service](gms)
|
||||
* [Metadata Ingestion](metadata-ingestion)
|
||||
* [Metadata Processing Jobs](metadata-jobs)
|
||||
|
||||
## Releases
|
||||
See [Releases](https://github.com/linkedin/datahub/releases) page for more details. We follow the [SemVer Specification](https://semver.org) when versioning the releases and adopt the [Keep a Changelog convention](https://keepachangelog.com/) for the changelog format.
|
||||
|
||||
## FAQs
|
||||
Frequently Asked Questions about DataHub can be found [here](faq.md).
|
||||
|
||||
## Features & Roadmap
|
||||
Check out DataHub's [Features](docs/features.md) & [Roadmap](docs/roadmap.md).
|
||||
|
||||
## Contributing
|
||||
We welcome contributions from the community. Please refer to our [Contributing Guidelines](CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features.
|
||||
|
||||
## Community
|
||||
Join our [slack workspace](https://app.slack.com/client/TUMKD5EGJ/DV0SB2ZQV/thread/GV2TEEZ5L-1583704023.001100) for discussions and important announcements. You can also find out more about our past and upcoming [town hall meetings](townhalls.md).
|
||||
|
||||
## Adoption
|
||||
Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it.
|
||||
* [Expedia Group](http://expedia.com)
|
||||
* [LinkedIn](http://linkedin.com)
|
||||
* [Saxo Bank](https://www.home.saxo)
|
||||
* [Shanghai HuaRui Bank](https://www.shrbank.com)
|
||||
* [TypeForm](http://typeform.com)
|
||||
* [Valassis]( https://www.valassis.com)
|
||||
|
||||
Here is a list of companies currently building POC or seriously evaluating DataHub.
|
||||
* [Booking.com](https://www.booking.com)
|
||||
* [Experian](https://www.experian.com)
|
||||
* [Geotab](https://www.geotab.com)
|
||||
* [Instructure](https://www.instructure.com)
|
||||
* [Microsoft](https://microsoft.com)
|
||||
* [Morgan Stanley](https://www.morganstanley.com)
|
||||
* [Orange Telecom](https://www.orange.com)
|
||||
* [SpotHero](https://spothero.com)
|
||||
* [Sysco AS](https://sysco.no)
|
||||
* [ThoughtWorks](https://www.thoughtworks.com)
|
||||
* [University of Phoenix](https://www.phoenix.edu)
|
||||
* [Vectice](https://www.vectice.com)
|
||||
|
||||
## Select Articles & Talks
|
||||
* [DataHub: A Generalized Metadata Search & Discovery Tool](https://engineering.linkedin.com/blog/2019/data-hub)
|
||||
* [Open sourcing DataHub: LinkedIn’s metadata search and discovery platform](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p)
|
||||
* [The evolution of metadata: LinkedIn’s story @ Strata Data Conference 2019](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019)
|
||||
* [Journey of metadata at LinkedIn @ Crunch Data Conference 2019](https://www.youtube.com/watch?v=OB-O0Y6OYDE)
|
||||
* [DataHub Journey with Expedia Group by Arun Vasudevan](https://www.youtube.com/watch?v=ajcRdB22s5o)
|
||||
* [Data Catalogue — Knowing your data](https://medium.com/albert-franzi/data-catalogue-knowing-your-data-15f7d0724900)
|
||||
* [LinkedIn DataHub Application Architecture Quick Understanding](https://medium.com/@liangjunjiang/linkedin-datahub-application-architecture-quick-understanding-a5b7868ee205)
|
||||
* [25 Hot New Data Tools and What They DON’T Do](https://blog.amplifypartners.com/25-hot-new-data-tools-and-what-they-dont-do/)
|
||||
|
||||
See the full list [here](links.md).
|
@ -1,11 +1,11 @@
|
||||
# Onboarding to GMA Graph - Adding a new relationship type
|
||||
|
||||
Steps for this already detailed in [How to onboard to GMA graph?](../how/graph-onboarding.md)
|
||||
Steps for this already detailed in https://github.com/linkedin/datahub/blob/master/docs/how/graph-onboarding.md
|
||||
|
||||
For this exercise, we'll add a new relationship type `FollowedBy` which is extracted out of `Follow` aspect. For that, we first need to add `Follow` aspect.
|
||||
|
||||
## 1. Onboard `Follow` aspect
|
||||
Referring to [How to add a new metadata aspect?](../how/add-new-aspect.md)
|
||||
Referring to the guide https://github.com/linkedin/datahub/blob/master/docs/how/add-new-aspect.md
|
||||
|
||||
### 1.1 Model new aspect
|
||||
* Follow.pdl
|
||||
|
@ -52,10 +52,10 @@ For reproducible technical issues, bugs and code contributions, Github [issues](
|
||||
The [DataHub Introduction](https://engineering.linkedin.com/blog/2019/data-hub) and [Open Sourcing Datahub](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) blog posts are also useful resources for getting a high level understanding of the system.
|
||||
|
||||
## Where can I learn about the roadmap?
|
||||
You can learn more about DataHub's [product roadmap](roadmap.md), which gets updated regularly.
|
||||
You can learn more about DataHub's [product roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md), which gets updated regularly.
|
||||
|
||||
## Where can I learn about the current list of features/functionalities?
|
||||
You can learn more about the current [list of features](features.md).
|
||||
You can learn more about the current [list of features](https://github.com/linkedin/datahub/blob/master/docs/features.md).
|
||||
|
||||
## Are the product strategy/vision/roadmap driven by the LinkedIn Engineering team, community, or a collaborative effort?
|
||||
Mixed of both LinkedIn DataHub team and the community. The roadmap will be a joint effort of both LinkedIn and the community. However, we’ll most likely prioritize tasks that align with the community's asks.
|
||||
@ -64,7 +64,7 @@ Mixed of both LinkedIn DataHub team and the community. The roadmap will be a joi
|
||||
LinkedIn is not using GCP so we cannot commit to building and testing that connectivity. However, we’ll be happy to accept community contributions for GCP integration. Also, our Slack channel and regularly scheduled town hall meetings are a good opportunity to meet with people from different companies who have similar requirements and might be interested in collaborating on these features.
|
||||
|
||||
## How approachable would LinkedIn be to provide insights/support or collaborate on a functionality?
|
||||
Please take a look at our [roadmap](roadmap.md) & [features](features.md) to get a sense of what’s being open sourced in the near future. If there’s something missing from the list, we’re open to discussion. In fact, the town hall would be the perfect venue for such discussions.
|
||||
Please take a look at our [roadmap](https://github.com/linkedin/datahub/blob/master/docs/roadmap.md) & [features](https://github.com/linkedin/datahub/blob/master/docs/features.md) to get a sense of what’s being open sourced in the near future. If there’s something missing from the list, we’re open to discussion. In fact, the town hall would be the perfect venue for such discussions.
|
||||
|
||||
## How do LinkedIn Engineering team and the community ensure the quality of the community code for DataHub?
|
||||
All PRs are reviewed by the LinkedIn team. Any extension/contribution coming from the community which LinkedIn team doesn’t have any expertise on will be placed into a incuation directory first (`/contrib`). Once it’s blessed and adopted by the community, we’ll graduate it from incubation and move it into the main code base.
|
||||
@ -105,7 +105,7 @@ The [SchemaField](https://github.com/linkedin/datahub/blob/master/metadata-model
|
||||
MCE is the ideal way to push metadata from different security zones, assuming there is a common Kafka infrastructure that aggregates the events from various security zones.
|
||||
|
||||
## What all data stores does DataHub backend support presently?
|
||||
Currently, DataHub supports all major database providers that are supported by Ebean as the document store i.e. Oracle, Postgres, MySQL, H2. We also support [Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store), which is LinkedIn's proprietary document store. Other than that, we support Elasticsearch and Neo4j for search and graph use cases, respectively. However, as data stores in the backend are all abstracted and accessed through DAOs, you should be able to easily support other data stores by plugging in your own DAO implementations. Please refer to [Metadata Serving](architecture/metadata-serving.md) for more details.
|
||||
Currently, DataHub supports all major database providers that are supported by Ebean as the document store i.e. Oracle, Postgres, MySQL, H2. We also support [Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store), which is LinkedIn's proprietary document store. Other than that, we support Elasticsearch and Neo4j for search and graph use cases, respectively. However, as data stores in the backend are all abstracted and accessed through DAOs, you should be able to easily support other data stores by plugging in your own DAO implementations. Please refer to https://github.com/linkedin/datahub/blob/master/docs/architecture/metadata-serving.md for more details.
|
||||
|
||||
## For which stores, you have discovery services?
|
||||
Supported data sources are listed [here](https://github.com/linkedin/datahub/tree/master/metadata-ingestion). To onboard your own data source which is not listed there, you can refer to the [onboarding guide](how/data-source-onboarding.md).
|
||||
|
@ -1,6 +1,6 @@
|
||||
# How to add a new metadata aspect?
|
||||
|
||||
Adding a new metadata [aspect](../what/aspect.md) is one of the most common ways to extend an existing [entity](../what/entity.md).
|
||||
Adding a new metadata [aspect](https://github.com/linkedin/datahub/blob/master/docs/what/aspect.md) is one of the most common ways to extend an existing [entity](https://github.com/linkedin/datahub/blob/master/docs/what/entity.md).
|
||||
We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/identity/CorpUserEditableInfo.pdl) as an example here.
|
||||
|
||||
1. Add the aspect model to the corresponding namespace (e.g. [`com.linkedin.identity`](https://github.com/linkedin/datahub/tree/master/metadata-models/src/main/pegasus/com/linkedin/identity))
|
||||
@ -17,4 +17,4 @@ We'll use the [CorpUserEditableInfo](https://github.com/linkedin/datahub/blob/ma
|
||||
5. (Optional) If there's need to update the aspect via API (instead of/in addition to MCE), add a [sub-resource](https://linkedin.github.io/rest.li/user_guide/restli_server#sub-resources) endpoint for the new aspect (e.g. [`CorpUsersEditableInfoResource`](https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/identity/CorpUsersEditableInfoResource.java)). The sub-resource endpiont also allows you to retrieve previous versions of the aspect as well as additional metadata such as the audit stamp.
|
||||
|
||||
6. After rebuilding & restarting [gms](https://github.com/linkedin/datahub/tree/master/gms), [mce-consumer-job](https://github.com/linkedin/datahub/tree/master/metadata-jobs/mce-consumer-job) & [mae-consumer-job](https://github.com/linkedin/datahub/tree/master/metadata-jobs/mae-consumer-job),
|
||||
you should be able to start emitting [MCE](../what/mxe.md) with the new aspect and have it automatically ingested & stored in DB.
|
||||
you should be able to start emitting [MCE](https://github.com/linkedin/datahub/blob/master/docs/what/mxe.md) with the new aspect and have it automatically ingested & stored in DB.
|
||||
|
@ -1,6 +1,6 @@
|
||||
# How to onboard a new data source?
|
||||
|
||||
In the [metadata-ingestion](https://github.com/linkedin/datahub/tree/master/metadata-ingestion), DataHub provides various kinds of metadata sources onboarding, including [Hive](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/hive-etl), [Kafka](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/kafka-etl), [LDAP](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/ldap-etl), [mySQL](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/mysql-etl), and generic [RDBMS](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/rdbms-etl) as ETL scripts to feed the metadata to the [GMS](../what/gms.md).
|
||||
In the [metadata-ingestion](https://github.com/linkedin/datahub/tree/master/metadata-ingestion), DataHub provides various kinds of metadata sources onboarding, including [Hive](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/hive-etl), [Kafka](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/kafka-etl), [LDAP](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/ldap-etl), [mySQL](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/mysql-etl), and generic [RDBMS](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/rdbms-etl) as ETL scripts to feed the metadata to the [GMS](https://github.com/linkedin/datahub/blob/master/docs/what/gms.md).
|
||||
|
||||
## 1. Extract
|
||||
The extract process will be specific tight to the data source, hence, the [data accessor](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/ldap-etl/ldap_etl.py#L103) should be able to reflect the correctness of the metadata from underlying data platforms.
|
||||
|
Loading…
x
Reference in New Issue
Block a user