23 Commits

Author SHA1 Message Date
david-leifker
21eb4dfc12
feat(search): update to support OpenSearch 2.x (#8852) 2023-09-21 13:01:55 -05:00
david-leifker
39920bb00f
feat(elasticsearch): Elasticsearch improvements (#6894) 2023-01-31 18:44:37 -06:00
Pedro Silva
e8f6c4cabd
feat(cli) Changes rollback behaviour to apply soft deletes by default (#4358)
* Changes rollback behaviour to apply soft deletes by default

Summary:
Addresses feature request: Flag in delete command to only delete aspects touched by an ingestion run; add flag to nuke everything by modifying the default behaviour of a rollback operation which will not by default delete an entity if a keyAspect is being rolled-back.

Instead the key aspect is kept and a StatusAspect is upserted with removed=true, effectively making a soft delete.
Another PR will follow to perform garbage collection on these soft deleted entities.

To keep old behaviour, a new parameter to the cli ingest rollback endpoint: --hard-delete was added.

* Adds restli specs

* Fixes deleteAspect endpoint & adds support for nested transactions

* Enable regression test & fix docker-compose for local development

* Add generated quickstart

* Fix quickstart generation script

* Adds missing var env to docker-compose-without-neo4j

* Sets status removed=true when ingesting resources

* Adds soft deletes for ElasticSearch + soft delete flags across ingestion sub-commands

* Makes elastic search consistent

* Update tests with new behaviour

* apply review comments

* apply review comment

* Forces Elastic search to add documents with status removed false when ingesting

* Reset gradle properties to default

* Fix tests
2022-03-15 12:05:52 -07:00
nsbala-tw
89f6c47d51
fix(elastic): Fix for log4j CVE-2021-44228 vulnerability (#3733)
Co-authored-by: balabarath <bagopila@gmail.com>
2021-12-13 23:35:10 -08:00
John Joyce
5a4d194bad
feat(docker): reduce quickstart footprint (#2744) 2021-06-23 12:59:49 -07:00
John Joyce
7591c8994a
feat(datahub cli): DataHub CLI Quickstart (#2689) 2021-06-14 17:15:24 -07:00
John Plaisted
b8e18b0b5d
refactor(docker): make docker files easier to use during development. (#1777)
* Make docker files easier to use during development.

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.
2020-08-06 16:38:53 -07:00
Liangjun Jiang
5d078aa617
Implemented data process search feature (#1706)
* implement search feature

* add test for dataprocessIndexBuilder; refactor code based on feedback

* update based on PR feedback

* Update DataProcessDocument.pdl

fixed typo wording.

* add not null check for data process info
2020-06-29 10:20:22 -07:00
Liangjun Jiang
92c4a3689e
Data process entity (#1680)
* add job info as aspect of a dataset

* add job urn def., aspect and entity

* job entity with upstream and downstream lineage

* use job urn in upstream & downstream

* add Job entity rest APIs

* rest.li api, impl and factory for job entity

* code cleanup

* use pdl; onboard data process entity

* add es index json

* fix gradlew build ignored tasks

* add a comment about data process info field

* fix style warning issues

* update content based on PR

* checked in generated snapshot json

* updated based on PR feedback

* update data process data format

* updated based on code review feedback

* revert back gms & mce-job docker image

* delete temp files

* update based pr feedback

* file name and a typo

* format with linkedin style

Co-authored-by: Liangjun <liajiang@expediagroup.com>
2020-06-09 15:42:08 -07:00
Mars Lan
4f221f9a12
build(docker): refactor docker build scripts (#1687)
* build(docker): refactor docker build scripts

- add "build" option to docker-compose files to simplify rebuilding of images
- create "start.sh" script so it's easier to override "command" in the quickstart's docker-compose file
- use dockerize to wait for requisite services to start up
- add a dedicated Dockerfile for kafka-setup

This fixes https://github.com/linkedin/datahub/issues/1549 & https://github.com/linkedin/datahub/issues/1550
2020-06-08 13:37:14 -07:00
Kerem Sahin
c009326bae Fix lowercase_keyword analyzer settings for people entity 2020-02-06 01:39:05 -08:00
Kerem Sahin
e56b6a2871 Add forward slash escape for Elasticsearch queries 2020-02-05 19:05:49 -08:00
Kerem Sahin
165d4aef95 Documentation update part-1 2019-12-18 18:57:18 -08:00
Kerem Sahin
4500e9ce7b Set Elasticsearch Docker container max heap size to 1GB and increase timeout to 120s for Elasticsearch to be ready 2019-11-12 18:13:37 -08:00
Kerem Sahin
bb38a9467a Add --build flag when using docker-compose up to always build elasticsearch-setup image 2019-10-05 15:28:54 -07:00
Kerem Sahin
c65a65c0b7 Update corp user search index mapping 2019-10-03 19:26:28 -07:00
Kerem Sahin
ea15628912 Add upstreams field to dataset search index mapping 2019-09-26 20:54:03 -07:00
Kerem Sahin
28b876f323 Update docs: No need for running initialization script for creating search indices 2019-09-12 19:06:35 -07:00
Kerem Sahin
1a3ddff4a4 Creating an image for elasticsearch-setup to automatically create indices 2019-09-12 19:04:29 -07:00
Kerem Sahin
1bf0ff72b4 Doc update: adding docker-compose pull before up to always get the latest version of images 2019-09-10 17:51:17 -07:00
Kerem Sahin
0dc5bd9fe0 Add documentation 2019-09-08 20:25:58 -07:00
Kerem Sahin
3f4048e2a7 Add docker compose for GMS and update other docker compose files 2019-09-02 16:44:34 -07:00
Kerem Sahin
23339df23a Initial commit for Data Hub 2019-08-31 20:51:14 -07:00