97 Commits

Author SHA1 Message Date
david-leifker
463803e2d1
feat(restore-indices): createDefaultAspects argument (#12859) 2025-03-13 10:17:14 -05:00
david-leifker
412600a163
feat(telemetry): cross-component async write tracing (#12405) 2025-01-29 11:30:44 -06:00
Chakru
85b42e3ea5
build(coverage): enable code coverage for java and python (#11992)
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
2024-12-02 19:27:43 -06:00
david-leifker
738eaed6f1
feat(throttle): extend throttling to API requests (#11325) 2024-09-12 09:52:20 -05:00
david-leifker
dfa9bd2779
feat(consumers): mce-consumer throttling based on mae-consumer lag (#10626) 2024-05-31 15:53:02 -05:00
Aseem Bansal
e14474176f
feat(lint): add spotless for java lint (#9373) 2023-12-06 11:02:42 +05:30
RyanHolstien
1b737243b2
feat(avro): upgrade avro to 1.11 (#9031) 2023-10-18 13:45:46 -05:00
david-leifker
1b79142d9e
feat(EntityService): batched transactions and ebean updates (#8456) 2023-09-02 19:25:44 -05:00
david-leifker
7dd6e09ac5
refactor(build): upgrade to gradle 7 & guava update (#8745) 2023-09-01 19:36:01 +05:30
david-leifker
749c3e85cb
chore(snappy): fix snappy version constraint (#8629) 2023-08-17 10:56:28 +05:30
david-leifker
cd05f5b174
feat(schema-registry): replace confluent schema registry (#7930)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ryan Holstien <ryan@acryl.io>
2023-05-01 13:18:41 -05:00
Pedro Silva
4732694780
fix(gms): Corrects MCP generation in async mode (#7214)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-02 11:45:44 -08:00
david-leifker
39920bb00f
feat(elasticsearch): Elasticsearch improvements (#6894) 2023-01-31 18:44:37 -06:00
david-leifker
ecc01b9a46
refactor(restli-mce-consumer) (#6744)
* fix(security): commons-text in frontend

* refactor(restli): set threads based on cpu cores
feat(mce-consumers): hit local restli endpoint

* testing docker build

* Add retry configuration options for entity client

* Kafka debugging

* fix(kafka-setup): parallelize topic creation

* Adjust docker build

* Docker build updates

* WIP

* fix(lint): metadata-ingestion lint

* fix(gradle-docker): fix docker frontend dep

* fix(elastic): fix race condition between gms and mae for index creation

* Revert "fix(elastic): fix race condition between gms and mae for index creation"

This reverts commit 9629d12c3bdb3c0dab87604d409ca4c642c9c6d3.

* fix(test): fix datahub frontend test for clean/test cycle

* fix(test): datahub-frontend missing assets in test

* fix(security): set protobuf lib datahub-upgrade & mce/mae-consumer

* gitingore update

* fix(docker): remove platform on docker base image, set by buildx

* refactor(kafka-producer): update kafka producer tracking/logging

* updates per PR feedback

* Add documentation around mce standalone consumer
Kafka consumer concurrency to follow thread count for restli & sql connection pool

Co-authored-by: leifker <dleifker@gmail.com>
Co-authored-by: Pedro Silva <pedro@acryl.io>
2022-12-26 16:09:08 +00:00
djordje-mijatovic
e6c48e5f19
feat(kafka): expose default kafka producer mechanism (#6381)
* Expose Kafka Sender Retry Parameters

* Implement KafkaHealthChecker

* feat(kafka): expose default kafka producer mechanism
2022-12-20 14:41:24 -06:00
david-leifker
2de9d3d5bf
fix(logging): Remove lombok as source of slf4j-api, convert to compileOnly where possible (#6616) 2022-12-04 19:57:47 -08:00
david-leifker
4ca3327d89
fix(security): update ranger commons & dependencies for security vulns (#6577)
* fix(security): update ranger commons & dependencies for security vulns
2022-11-30 17:05:01 -06:00
RyanHolstien
bfb903cfb8
feat(ingest): add async option to ingest proposal endpoint (#6097)
* feat(ingest): add async option to ingest proposal endpoint

* small tweak to validate before write to K, also keep existing path for timeseries aspects

* avoid double convert

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-10-03 19:56:19 -05:00
John Joyce
c69310522b
feat(metadata service): Introducing Platform Events (#4477) 2022-03-29 18:32:04 -07:00
RyanHolstien
34c27f076b
feat(removeGMA): remove all dependencies on gma libraries (#3835) 2022-01-05 17:32:31 -08:00
xiphl
8cd1e91072
Upgrade to 3rd Apache patch for log4j (#3772) 2021-12-20 06:55:22 -08:00
John Joyce
5b5135be0b
fix(vuln): log4j vulnerability - bumping to 2.16.0 (#3755) 2021-12-15 11:07:45 -08:00
Fredrik Sannholm
d651040c85
Fix vulnderability (#3716) 2021-12-10 10:07:55 -08:00
Claudio Benfatto
f9bc3b32c4
fix(metadata-service): fix debug logging in MAE producer (#3626)
closes: https://github.com/linkedin/datahub/issues/3625
2021-11-28 21:07:42 -08:00
John Joyce
a92ab66a3a
refactor(nocode): Final part of No-Code cleanup (#3477) 2021-10-31 22:06:36 -07:00
Dexter Lee
8747fbe43c
feat(perf): Add perf testing and monitoring framework (#3195) 2021-09-07 23:06:15 -07:00
John Joyce
f3fc0970f3
refactor(build): Remove unnecessary ext modules. (#3074) 2021-08-10 22:48:06 -07:00
John Joyce
20b1685de2
fix(gms): better logging on failed MCL / MAE (#3007) 2021-08-02 17:53:56 -07:00
John Joyce
352a0abf8d
Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect (#2983)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-07-30 17:41:03 -07:00
Gabe Lyons
aa253f5b3b
feat(deletes): add run commands (list, show, rollback) to datahub ingest (#2960) 2021-07-29 20:04:40 -07:00
John Joyce
09cbc548a4
feat(logs): improve logging in GMS and datahub-frontend (#2761) 2021-06-25 10:56:45 -07:00
John Joyce
97e9660037
feat: No Code Metadata Modeling (#2629)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-06-03 13:24:33 -07:00
shakti-garg
8ed14a62e2
feat(business_glossary): add new entity business term and its relationship with dataset and its fields (#2228)
Co-authored-by: shubham.garg <shubham.garg@thoughtworks.com>
2021-05-10 13:20:23 -07:00
Dexter Lee
fa015c5aaa
fix(kafka-topic-convention): Fix DAOs that do not refer to TopicConvention (#2387) 2021-04-13 07:58:31 -07:00
Fredrik Sannholm
e2d6adc906
fix(datajob): Fix URI templates for datajob and dataflow (#2324) 2021-03-31 12:27:43 -07:00
Fredrik Sannholm
b02c7a345c
fix(tags): Support creating tags with MCE (#2320) 2021-03-31 11:16:12 -07:00
Gabe Lyons
039fe597f7
feat(tags): editing tags from react client on datasets, schemas, charts & dashboards (#2248) 2021-03-18 11:52:14 -07:00
Fredrik Sannholm
da6b3d111d
feat(datajob): Backend implementation (#2197) 2021-03-13 08:00:44 -08:00
RyanHolstien
ea86ade29b
feat: ML Model Backend Implementation (#1896)
Co-authored-by: RyanHolstien <rholstien@expediagroup.com>
2021-02-17 13:28:13 -08:00
Nagarjuna Kanamarlapudi
f9d33f5519
(refactor): Convert dataPlatforms to GMA aspect models and associated resource to GMA resource. (#2057)
* (refactor): Convert dataPlatforms to GMA aspect and associated resource to GMA resource.

BREAKING CHANGE: /datasets/dataPlatforms API is now changed to become GMA resource.

* Change documentation style
2021-01-20 15:50:48 -08:00
Kerem Sahin
4d8320e4a0
feat(dashboard): Dashboards backend implementation (#1884) 2020-11-23 09:25:58 -08:00
John Plaisted
25b663cc18
refactor: move code to linkedin/datahub-gma. (#1955)
Move code to linkedin/datahub-gma.

"GMA" (Generalized Metadata Architecture) is the backend of DataHub, and has been moved to its own repository.

This deletes the code that was moved and uses jars that GMA publishes to bintray to load it.

Note that not all of GMA was moved, but most of it. We may still move more things to the other repository in the future.
2020-10-23 15:14:57 -07:00
John Plaisted
2f86cd680e
[BREAKING] Break dependency of ebean-dao on metadata-models. (#1895)
The coupling was between the static path extractor API. This broken by making a new `UrnPathExtractor` interface, and adding an overload of `EbeanLocalDAO`'s constructor to accept one (no breaking constructor change). The old constructors default to an `EmptyPathExtractor`, which does nothing (which is a breaking behavioral change, see below).

BREAKING: `DatasetUrnPathExtractor` was deleted. No one should've been depending on this directly. However, downstreams that were relying on it being there at runtime (dataset GMS) need to copy `DatasetUrnPathExtractor` and create their `EbeanLocalDAO` with one. Note that this is a little dangerous becasue it is a runtime behavioral change only. Potential impact is that SCSI suddenly stops working as intended.

SYNC=metadata-models_101.0.0
2020-09-28 12:30:34 -07:00
Jyoti Wadhwani
dc9c877984 [scsi] preserve the order of urns 2020-09-24 16:02:12 -07:00
John Plaisted
542ae67cb1 Add support for customizing topic names via a convention.
Requested by a few people in OS. See https://github.com/linkedin/datahub/issues/1840.

Companies need full customization over the topic name. This new class should be easily customizable by using a spring factory.

TODO to finish the implmentation for v5. For right now v5 is LI only and unfinished. Getting this in for v4 so it is useful to other companies now.

TODO AFTER OPEN SOURCE PUSH - make configurable via spring
TODO AFTER SUBMIT - see where else we can use this (jobs, where else?)
2020-09-24 16:02:12 -07:00
Jyoti Wadhwani
b944910b0e extend filter finder method to get metadata from SCSI 2020-09-24 16:02:12 -07:00
Jyoti Wadhwani
c3c52cf8f6 add support for getWithExtraInfo in BaseLocalDAO 2020-09-24 16:02:12 -07:00
John Plaisted
8d536a54d3 Break dependency of metadata-dao on metadata-models.
This also breaks a few others transitively.
2020-09-24 16:02:12 -07:00
John Plaisted
e4ce0376d2 Fix open source build.
Do not use internal URN API.
2020-09-24 16:02:12 -07:00
John Plaisted
96da83033c Break dependency of metadata-test-utils on metadata-models. 2020-09-24 16:02:12 -07:00