12076 Commits

Author SHA1 Message Date
Jyoti Wadhwani
6e40ce8698 add method to obtain field values from RecordTemplate 2020-09-11 09:15:56 -07:00
John Plaisted
23ad0e9c8b
Small fixes to mce_cli (#1868)
- default argument value should be None not "None"
- Test data should have corpuser, not corpUser (case sensitive)

fixes https://github.com/linkedin/datahub/issues/1867
fixes https://github.com/linkedin/datahub/issues/1865
2020-09-10 19:30:47 -07:00
John Plaisted
368e1df41d
Update development.md 2020-09-10 16:04:05 -07:00
Arun Vasudevan
66dd008e3d
feat: add ML models (#1721)
* ML Model Schema Initial Version for feedback

* Added Deprecation Model

* Remove lock files

* Committing yarn lock file

* Fix Review Comments

* Using Common VersionTag Entity

* PR Review Comments Round-2

* Updated all model and feature references to MLModel and MLFeature

* Addressing PR Comments (Round 3)

* Updating Hyperparameter to a Map type

* Update to Dataset

* Review comments based on RFC

* ML Model Schema Initial Version for feedback

* Added Deprecation Model

* Remove lock files

* Committing yarn lock file

* Fix Review Comments

* Using Common VersionTag Entity

* PR Review Comments Round-2

* Updated all model and feature references to MLModel and MLFeature

* Addressing PR Comments (Round 3)

* Updating Hyperparameter to a Map type

* Update to Dataset

* fix: modify the etl script dependency (#1726)

Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>

* fix: correct the way to catch the exception (#1727)

* fix: modify the etl script dependency

* fix: Correct the way to catch the exception

* fix: Compatible with the following kafka cluster when the Kafka Topic message Key cannot be empty

* fix: Adjust the kafka message key; Improve the comment of field

* fix: Avro schema required for key

Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>

* refactor(models): remove internal cluster model (#1733)

* refactor(models): remove internal cluster model

Remove internal model which is not used in open source

* build(deps): bump lodash from 4.17.15 to 4.17.19 in /datahub-web (#1738)

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.19.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.15...4.17.19)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update README.md

* Update README.md

* Update README.md

* Update the roadmap (#1740)

* Update the roadmap

- Make short term more like what we're doing this quarter
- Medium term is next quarter
- Long term is 2 or 3 quarters from now
- Visionary is even beyond that

Making this PR mostly to discuss the roadmap. I've moved a few items down to "unprioritized"; before merging this we should put these in a category. Mostly saving the state of what I've done so far.

* Update roadmap.md

Co-authored-by: Mars Lan <mars.th.lan@gmail.com>

* Update roadmap.md

* Update README.md

* doc: add a separate doc to keep track of the full list or links (#1744)

* Update README.md

* Create links.md

* Update README.md

* Update links.md

* Update README.md

* Update README.md

* Update features.md

* Update faq.md

* Update README.md

* Update README.md

* feat(gms): add postgres & mariadb supports to GMS (#1742)

* feat(gms): add postgres & mariadb supports to GMS

Also add corresponding docker-compose files

* Update README.md

* build(frontend): Drop unnecessary DB-related dependencies (#1741)

* refactor(frontend): Drop unnecessary DB-related dependencies

* Drop unused dependencies from top-level build script

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update links.md

* Update README.md

* Doc fixes

* Update roadmap.md

* Update faq.md

* Set theme jekyll-theme-cayman

* Create _config.yml

* Delete _config.yml

* Set theme jekyll-theme-cayman

* Update _config.yml

* Update _config.yml

* build: build GitHub Page from /docs directory (#1750)

- Move top-level MD files to /docs and symlink them back
- Update all absolute links to files in /docs to relative links

* Revert "build: build GitHub Page from /docs directory (#1750)" (#1751)

This reverts commit b0f56de7a81b8bf921ff37cb81024692d1b9a8ce.

* build: build GitHub Pages from /docs directory (#1752)

- Move non-README top-level MD files to /docs
- Update all absolute links to files in /docs to relative links
- Add a placeholder front page for GitHub Pages

* Update README.md

* Update README.md

* Update README.md

* feat(kafka-config): Add ability to configure other Kafka props (#1745)

* Integarte spring-kafka & spring-boot for security props

- Upgrade spring-kafka to 2.1.14
- Use KafkaListener and KafkaTemplates to enable KafkaAutoConfiguration
- Integrates spring-boot's KafkaProperties into spring-kafka's config

* Cleanup imports

* Add DataHub kafka env vars

* Remove kafka-streams dependency

* Add KafkaProperties to gms; Add docs

* Add to Adoption

* Remove KAFKA_BOOTSTRAP_SERVER default

Co-authored-by: jsotelo <javier.sotelo@viasat.com>
Co-authored-by: Kerem Sahin <ksahin@linkedin.com>

* Agenda for next town hall

* Update townhalls.md

* Update README.md

* Update README.md

* Add documentation around the DataHub RFC process. (#1754)

Other repos have similar RFC processes (though they seem to have a separate repo for their RFC docs).

This provides a more structured way for contributors to make siginficant design contributions.

https://github.com/linkedin/datahub/issues/1692

* metadata-models 72.0.8 -> 80.0.0 (#1756)

* <refactor>[ingestions]: align the default kafka topics with PR #1756 (#1758)

* docs: add a sequence diagram and a description (#1757)

* add a sequence diagram and a description

* update descrpition based on feedback

* Update README.md

* Update README.md

Co-authored-by: Mars Lan <mars.th.lan@gmail.com>

* Update README.md

* Fix reflinks in PR template (#1764)

* Update kafka-config.md (#1763)

Fix name of spring-kafka property to pass SASL_JAAS config

* Update entity.md

* Update README.md

* Update faq.md

* Update townhalls.md

* Update README.md

* Update townhalls.md

* Update townhalls.md

* docs: move quickstart guide to a separate file under docs (#1765)

docs: move quickstart guide to a separate doc under docs directory

* Update slack.md

* Update README.md

* Update slack.md

* Update metadata-ingestion.md

* Add workflow to check build and tests on PRs + releases. (#1769)

PRs are setup to skip docs.

Also, only run docker actions on linkedin/datahub (i.e. disable on forks; makes forks nicer since you don't have failing actions).

* Update developers.md

* Update developers.md

* Update README.md

* fix(models): remove unused model (#1748)

* fix(models): remove unused model

Fixes https://github.com/linkedin/datahub/issues/1719

* Drop DeploymentInfo from Dataset's value model & rebuild snapshot

* Update README.md

* Add a separate page for previous townhalls

* Update for August invite; link to history

* Update README.md

* build: remove travis (we're using GitHub actions). (#1770)

Remove travis (we're using GitHub actions).

Also ignore markdown in our current workflows.

Also update the README.md badge.

* update townhall date

* Update README.md

* Update townhalls.md

* build(docker): build & publish GitHub Package (#1771)

* build(docker): build & publish docker images to GitHub Packages

Will kepp publishing to Docker Hub meanwhile until all Dockerfiles have been updated to point to GitHub.
Fixes https://github.com/linkedin/datahub/issues/1548

* Rebase & fix dockerfile locations

* Update README.md

* Fix README.md

* docs: add placeholders for advanced topics (#1780)

* Create high-cardinality.md

* Create pdl-best-practices

* Create partial-update.md

* Rename pdl-best-practices to pdl-best-practices.md

* Create entity-hierarchy.md

* docs: more placeholders for advance topics (#1781)

* Create aspect-versioning.md

* Create derived-aspects.md

* Create backfilling.md

* Update README.md

* Update aspect-versioning.md

* Update aspect.md

* Update README.md

* Update townhall-history.md

* Update townhall-history.md

* Update rfc.md

* refactor(docker): make docker files easier to use during development. (#1777)

* Make docker files easier to use during development.

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.

* build: fix docker actions. (#1787)

* bug: Fix docker actions.

We renamed directories in docker/ which broke the actions.

Also try to refactor the action files a little so that we can run (but not publish) these images on pull requests that change the docker/ dir as an extra check. Note this only seems to be supported by the dockerhub plugin; the github plugin doesn't support this (so that will be an issue when we move to it only).

* Drop extra pipes

* Update README.md

* refactor: remove unused model (#1788)

* refactor: remove unused internal models (#1789)

* docs: create search-over-new-field.md (#1790)

Add a doc on searching over a new field

* Update search-onboarding.md

* add description field for dataset index mapping (#1791)

* docs: how to customize the search experience (#1795)

* add description field for dataset index mapping

* documentation on how to customize the search experience

* feat(ingest): add example crawler for MS SQL (#1803)

Also fix the incorrect assumption on column comments & add sample docker-compose file

* Add log documentation

we didn't end up mounting logs; docker desktop is a better experience

* Update townhall-history.md

* Update quickstart.md

* fix(search): clear description from dataset index when it's cleared (#1808)

Fixes https://github.com/linkedin/datahub/issues/1798

* Update README.md

* Revert "Update README.md"

This reverts commit 74a0d7b262a2ac22de9bc52974b721d580914ff0.

* Update README.md

* Update README.md

* Update high-cardinality.md

* Update README.md

* Update relationship.md

* Update high-cardinality.md

* Update metadata-models to head! (#1811)

metadata-models 80.0.0 -> 90.0.13:

   90.0.13: Roll forward: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    90.0.2: Refactor listUrnsFromIndex method
    90.0.0: Start distinguishing between [] aspects vs null aspects input param
    89.0.4: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    89.0.2: fix some test case name
    89.0.0: META-12686: Made the MXE_v5 topics become strictly ACL'ed to avoid the wildcard write ACL as "MetadataXEvent.+"
    88.0.6: change DAO to take Storage Config as input
    88.0.3: Add a comment on lack of avro generation for MXEv5 + add MXEv5 to the pegasus validation task.
   87.0.15: META-12651: Integrate the metadata-models-ext with metadata-models
   87.0.13: add StorageConfig to Local DAO
    87.0.3: Treat empty aspect vs optional aspect same until all clients are migrated
    87.0.2: Treat empty aspect vs optional aspect differently
    87.0.1: META-12533: Skip processing unregistered aspect specific MAE.
    83.0.6: action method to return list of urns from strong consistent index
    83.0.4: Change input param type for batch backfill
    83.0.3: Implement batch backfill
    83.0.1: Implement support for OR filter in browse query
   82.0.10: Throw UnsupportedOperationException for unsupported condition types in search filter
    82.0.6: Implement local secondary backfilling index as part of backfill method
    82.0.5: [strongly consistent index] implement getUrns method
    82.0.4: Add indexing urn fields to the local secondary index
    82.0.0: Render Delta fiels in the MCE_v5.
    81.0.1: Add pegasus to avro conversion for FMCE
    80.0.4: add get all support for BaseSingleAspectEntitySimpleKeyResource
    80.0.2: Add a BaseSearchWriterDAO with an ESBulkWriterDAO implementation.
    80.0.1: META-12254: Produce aspect specific MAE with always emit option
    80.0.0: Convert getNodesInTraversedPath to getSubgraph to return complete view of the subgraph (nodes+edges)

* Update townhalls.md

* Update townhalls.md

* fix: drop the commits badge as it's flakey

* Update README.md

* fix: update defaults of aspectNames params (#1815)

fix: Update defaults of aspectNames params.

The last PR to sync internal code broke the external GMS, as code was now expected aspectNames to be null rather than empty by default. This preventing me logging into DataHub as the corp user request would fail (it assumed I asked for no aspects rather than all aspects).

TESTED: Built locally, launched with docker/dev.sh (so used latest frontend, but whatever). Verified I can now log into DataHub, browse and search for datasets, and view my profile.

* Update README.md

* Update README.md

* feat(kubernetes): Improve the security of the kubernetes/helm charts (#1782)

* 1747 | remove obsolete yaml files

* 1747 | remove configmap and its hardcoded references

* 1747 | add missing input parameter of neo4j.host

* 1747 | remove obsolete secrets and parameterize the rest

* 1747 | auto-generate gms secret

* 1747 | remove fullName overrides

* 1747 | fix parameters in subchart's values.yaml

* 1747 | remove hardcoding from parameters for gms host and port

* 1747 | upgrade chart version

* 1747 | update helm docs

* 1747 | add extraEnv, extraVolume and extraMounts

* 1747 | Alters pull policy of images to 'always' for ldh

Co-authored-by: shakti-garg <shakti.garg@gmail.com>

* Update README.md

* feat(data-platforms): adding rest resource for /dataPlatforms and mid-tier support (#1817)

* feat(data-platforms): Adding rest resource for /dataPlatforms and mid-tier support

* Removed data platforms which are Linkedin internal

* docs: add NOTICE (#1810)

* Copy NOTICE from wherehows

Copies the file from the wherehows branch.

* Update notice.

* Update links.md

* Update links.md

* Update README.md

* feat(dashboards): RFC for dashboards (#1778)

* feature(dashboards): RFC for dashboards

* Change directory structure

* Create goals & non-goals sections

* Removing alternatives section

* Update README.md

* Update links.md

* Update townhalls.md

* Update notice to include embedded licenses

Also list apache projects specifically.

* feat(frontend): update datahub-web client UI code (#1806)

* Releases updated version of datahub-web client UI code

* Fix typo in yarn lock

* Change yarn lock to match yarn registry directories

* Previous commit missed some paths

* Even more changes to yarnlock missing in previous commit

* Include codegen file for typings

* Add files to get parity for datahub-web and current OS datahub-midtier

* Add in typo fix from previous commit - change to proper license

* Implement proper OS fix for person entity picture url

* Workarounds for open source DH issues

* Fixes institutional memory api and removes unopensourced tabs for datasets

* Fixes search dataset deprecation and user search issue as a result of changes

* Remove internal only options in the avatar menu

* Update search-over-new-field.md

* docs: add external link (#1828)

* Update README.md

* Update links.md

* Review comments based on RFC

Co-authored-by: cobolbaby <cobolbaby@qq.com>
Co-authored-by: Cobolbaby <Zhang.Xing-Long@inventec.com>
Co-authored-by: Harsh Shah <hrshah@linkedin.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mars Lan <mars.th.lan@gmail.com>
Co-authored-by: John Plaisted <jplaisted@linkedin.com>
Co-authored-by: Kerem Sahin <ksahin@linkedin.com>
Co-authored-by: Javier Sotelo <javier.a.sotelo@gmail.com>
Co-authored-by: jsotelo <javier.sotelo@viasat.com>
Co-authored-by: Jyoti Wadhwani <jywadhwani@linkedin.com>
Co-authored-by: Chris Lee <wlee@linkedin.com>
Co-authored-by: Liangjun Jiang <ljiang510@gmail.com>
Co-authored-by: shakti-garg-saxo <68685481+shakti-garg-saxo@users.noreply.github.com>
Co-authored-by: na zhang <nazhang@linkedin.com>
Co-authored-by: shakti-garg <shakti.garg@gmail.com>
Co-authored-by: Charlie Tran <catran@linkedin.com>
2020-09-10 15:52:50 -07:00
Mars Lan
e8a1d61961
Update links.md 2020-09-10 03:43:15 -07:00
Mars Lan
9f70a9ed97
Update README.md 2020-09-10 03:42:40 -07:00
Grant Nicholas
f267c8c8e2
fix(gms): update kafka client libraries to a newer version to support schema registry basic auth + SSL (#1863) 2020-09-09 14:37:47 -07:00
Mars Lan
f0485a490e
feat(platform): add "postgres" as a supported data platform (#1859)
* feat(platform): add "postgres" as a supported data platform

* update tests
2020-09-08 10:21:08 -07:00
fabiofilz
340c54317c
1849 support ssl to mce cli.py (#1857)
* Adding SSL support to mce_cli.py

* Kafka Config option

* Adding space and removing the commented line

Co-authored-by: Fabio de Simoni <fabio.desimoni@kindredgroup.com>
2020-09-04 12:17:27 -07:00
Charlie Tran
c2c6f66ca8
feat(frontend): Module consolidation for some test modules and reduces errors from unsupported API calls (#1844)
* frontend - Module consolidation for some test modules and reduces errors from unsupported API calls

* Fix broken test
2020-09-03 10:37:02 -07:00
Mars Lan
846152cf97
Update entity.md 2020-09-03 04:47:10 -07:00
John Plaisted
38c4dd7a65
Update faq.md 2020-09-01 12:49:02 -07:00
John Plaisted
9149b7538a
Update faq.md 2020-09-01 12:48:46 -07:00
John Plaisted
7105d809ac
Add Kafka SSL guide 2020-09-01 12:48:10 -07:00
Mars Lan
6d827fde4e
Update README.md 2020-09-01 06:10:04 -07:00
Mars Lan
7d6fde4f37
feat: add MCE ingestion support for CorpGroup (#1837)
* feat: add MCE ingestion support for CorpGroup

Also use consistent camel case for corp user URNs in bootstrap MCE data

Fixes https://github.com/linkedin/datahub/issues/1822
2020-08-31 10:08:58 -07:00
John Plaisted
94d2970a5e
Add 8/28 recording 2020-08-31 10:04:55 -07:00
Mars Lan
3dc875f0d4
Update townhall-history.md 2020-08-31 05:08:02 -07:00
Mars Lan
685b9771da
Update townhall-history.md 2020-08-31 05:07:42 -07:00
Mars Lan
b1afcaa623
Add files via upload 2020-08-31 05:07:12 -07:00
Mars Lan
f4bccaf052
fix(ingestion): set schema registry URL correctly for FMCE producer (#1839)
Fixes https://github.com/linkedin/datahub/issues/1829
2020-08-31 04:50:54 -07:00
Mars Lan
85a6cd1698
build(node): replace broken & unmaintained gradle node plugin (#1838) 2020-08-30 19:43:36 -07:00
Mars Lan
5fc70c31ba
Update links.md 2020-08-30 10:43:56 -07:00
John Plaisted
3a49854caa
Update townhalls.md 2020-08-29 17:17:59 -07:00
Charlie Tran
0116a955cb
Pushing internal consolidation of modules to open source (#1835) 2020-08-28 16:19:46 -07:00
Charlie Tran
5072ef013c
feat(frontend): Module consolidation - clean up for OS logic - init virtual assistant (#1821)
* Module consolidation on datahub-web - clean up to some OS logic from previous update - initial implementation of virtual assistant

* Fix accidental change to datahub user module license
2020-08-28 10:31:15 -07:00
Mars Lan
6c66797f55
Update README.md 2020-08-28 10:17:49 -07:00
Mars Lan
17e2795ca7
Update README.md 2020-08-28 10:17:03 -07:00
John Plaisted
c488bbc02e
Update townhalls.md 2020-08-28 10:05:22 -07:00
John Plaisted
9e4cc5e037
Update townhalls.md 2020-08-28 10:04:17 -07:00
John Plaisted
3e15b7e3b6
Update townhall-history.md 2020-08-28 10:03:04 -07:00
John Plaisted
9888d68f4b
Update development.md 2020-08-27 15:52:16 -07:00
Mars Lan
7299fd589f
docs: add external link (#1828)
* Update README.md

* Update links.md
v0.5.0-BETA
2020-08-27 03:42:34 -07:00
Mars Lan
c4c6f2aaec
Update search-over-new-field.md 2020-08-27 03:16:42 -07:00
Charlie Tran
843a6c5bbb
feat(frontend): update datahub-web client UI code (#1806)
* Releases updated version of datahub-web client UI code

* Fix typo in yarn lock

* Change yarn lock to match yarn registry directories

* Previous commit missed some paths

* Even more changes to yarnlock missing in previous commit

* Include codegen file for typings

* Add files to get parity for datahub-web and current OS datahub-midtier

* Add in typo fix from previous commit - change to proper license

* Implement proper OS fix for person entity picture url

* Workarounds for open source DH issues

* Fixes institutional memory api and removes unopensourced tabs for datasets

* Fixes search dataset deprecation and user search issue as a result of changes

* Remove internal only options in the avatar menu
2020-08-26 15:44:50 -07:00
John Plaisted
403da14585
Update notice to include embedded licenses
Also list apache projects specifically.
2020-08-26 11:45:38 -07:00
John Plaisted
505afee18b
Update townhalls.md 2020-08-25 09:26:14 -07:00
Mars Lan
cc74d35f1a
Update links.md 2020-08-24 21:31:13 -07:00
Mars Lan
81858540e9
Update README.md 2020-08-24 21:30:46 -07:00
Kerem Sahin
ce9dc9ce38
feat(dashboards): RFC for dashboards (#1778)
* feature(dashboards): RFC for dashboards

* Change directory structure

* Create goals & non-goals sections

* Removing alternatives section
2020-08-24 18:30:05 -07:00
Mars Lan
1806d80a91
Update README.md 2020-08-23 07:00:16 -07:00
Mars Lan
379a6f0e0c
Update links.md 2020-08-23 06:57:00 -07:00
Mars Lan
cd259cae28
Update links.md 2020-08-23 06:46:25 -07:00
John Plaisted
f998677da0
docs: add NOTICE (#1810)
* Copy NOTICE from wherehows

Copies the file from the wherehows branch.

* Update notice.
2020-08-20 17:03:35 -07:00
Kerem Sahin
57f81d488d
feat(data-platforms): adding rest resource for /dataPlatforms and mid-tier support (#1817)
* feat(data-platforms): Adding rest resource for /dataPlatforms and mid-tier support

* Removed data platforms which are Linkedin internal
2020-08-20 12:55:30 -07:00
Mars Lan
b35a1b329b
Update README.md 2020-08-20 10:01:00 -07:00
shakti-garg-saxo
236d5e6271
feat(kubernetes): Improve the security of the kubernetes/helm charts (#1782)
* 1747 | remove obsolete yaml files

* 1747 | remove configmap and its hardcoded references

* 1747 | add missing input parameter of neo4j.host

* 1747 | remove obsolete secrets and parameterize the rest

* 1747 | auto-generate gms secret

* 1747 | remove fullName overrides

* 1747 | fix parameters in subchart's values.yaml

* 1747 | remove hardcoding from parameters for gms host and port

* 1747 | upgrade chart version

* 1747 | update helm docs

* 1747 | add extraEnv, extraVolume and extraMounts

* 1747 | Alters pull policy of images to 'always' for ldh

Co-authored-by: shakti-garg <shakti.garg@gmail.com>
v0.4.3
2020-08-20 02:42:22 -07:00
Kerem Sahin
ece9b82f7a
Update README.md 2020-08-19 21:39:46 -07:00
Kerem Sahin
21a5c9e607
Update README.md 2020-08-19 21:38:03 -07:00
John Plaisted
b673c8e160
fix: update defaults of aspectNames params (#1815)
fix: Update defaults of aspectNames params.

The last PR to sync internal code broke the external GMS, as code was now expected aspectNames to be null rather than empty by default. This preventing me logging into DataHub as the corp user request would fail (it assumed I asked for no aspects rather than all aspects).

TESTED: Built locally, launched with docker/dev.sh (so used latest frontend, but whatever). Verified I can now log into DataHub, browse and search for datasets, and view my profile.
2020-08-19 18:42:56 -07:00