3904 Commits

Author SHA1 Message Date
Mayuri Nehate
14b48489d4
feat(ingest): pass timeout config in kafka admin client api calls (#6863) 2022-12-27 12:45:11 -08:00
Harshal Sheth
31260888fc
feat(ingest/airflow): support raw dataset urns in airflow lineage (#6854)
* feat(ingest/airflow): support dataset Urns in airflow lineage

This PR also
- resolves a reported circular import issue
- refactors the Airflow tests to reduce duplication

* fix test
2022-12-27 08:59:26 +01:00
Mayuri Nehate
69a2347db1
feat(ingest): update profiling to fetch configurable number of sample values (#6859) 2022-12-27 08:57:26 +01:00
david-leifker
ecc01b9a46
refactor(restli-mce-consumer) (#6744)
* fix(security): commons-text in frontend

* refactor(restli): set threads based on cpu cores
feat(mce-consumers): hit local restli endpoint

* testing docker build

* Add retry configuration options for entity client

* Kafka debugging

* fix(kafka-setup): parallelize topic creation

* Adjust docker build

* Docker build updates

* WIP

* fix(lint): metadata-ingestion lint

* fix(gradle-docker): fix docker frontend dep

* fix(elastic): fix race condition between gms and mae for index creation

* Revert "fix(elastic): fix race condition between gms and mae for index creation"

This reverts commit 9629d12c3bdb3c0dab87604d409ca4c642c9c6d3.

* fix(test): fix datahub frontend test for clean/test cycle

* fix(test): datahub-frontend missing assets in test

* fix(security): set protobuf lib datahub-upgrade & mce/mae-consumer

* gitingore update

* fix(docker): remove platform on docker base image, set by buildx

* refactor(kafka-producer): update kafka producer tracking/logging

* updates per PR feedback

* Add documentation around mce standalone consumer
Kafka consumer concurrency to follow thread count for restli & sql connection pool

Co-authored-by: leifker <dleifker@gmail.com>
Co-authored-by: Pedro Silva <pedro@acryl.io>
2022-12-26 16:09:08 +00:00
Harshal Sheth
392115b4c4
feat(ingest): add pydantic helper for removed fields (#6853) 2022-12-26 15:31:49 +05:30
Harshal Sheth
ea5ee6f761
fix(ingest/looker): handle missing label fields (#6849) 2022-12-22 19:43:44 -05:00
mohdsiddique
9daa8ed56f
feat(ingestion): Business Glossary# Add domain support in GlossaryTerm ingestion (#6829)
* lint fix

* domain in term

* domain in term

* review comments

* add todo

Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-12-22 17:47:57 -05:00
Harshal Sheth
1d0c7852a7
feat(ingest): add db/schema properties hook to SQL common (#6847) 2022-12-22 13:38:59 -08:00
John Joyce
4cba09e97d
fix(ingest): Fixing lint (#6844) 2022-12-22 08:33:18 -08:00
wangsaisai
0f8e2d945e
fix(ingest): kafka ingest task hand up with error bootstrap server (#6820) 2022-12-22 07:39:30 -08:00
Mayuri Nehate
a05c5c4069
feat(ingest): extract kafka topic config properties as customProperties (#6783) 2022-12-22 09:34:55 +01:00
John Joyce
2e3a25123d
refactor(ingestion): Browse Paths Upgrade V2 Feast & Sagemaker (#6002) 2022-12-21 08:02:59 -08:00
Dago Romer
9cb1eed6e7
fix(ingest): fixed snowflake oauth ingestion not using role attribute from recipe (#6825) 2022-12-21 07:52:06 -08:00
Harshal Sheth
e2b4a65a8e
refactor(ingest): clean up exception types (#6818) 2022-12-21 07:28:18 -08:00
Harshal Sheth
8972ea4b04
fix(ingest): support patches in auto_status_aspect (#6827)
Patches generate a raw MCP because MCPW doesn't support patches right now, so we need to handle that correctly downstream.
2022-12-21 10:25:24 +01:00
Tamas Nemeth
a1970d2dce
feat(ingest/bigquery): add option to enable/disable legacy sharded table support (#6822)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
2022-12-20 23:29:46 -05:00
Harshal Sheth
2c911ccf7b
refactor(ingest): clean up pipeline init error handling (#6817) 2022-12-20 19:21:28 -08:00
Harshal Sheth
88e40a9069
feat(ingest): add failure/warning counts to ingest_stats (#6823) 2022-12-20 19:13:11 -08:00
Harshal Sheth
137f4500b6
feat(ingest/stateful): remove platform_instance_id from state urn (#6795) 2022-12-20 12:12:19 -05:00
Harshal Sheth
5584bfb469
refactor(ingest/stateful): remove get_last_state method (#6794) 2022-12-19 20:48:22 -05:00
raysaka
fcb3242983
chore(ingest): bump python package dependencies to resolve vulns (#6384)
Co-authored-by: John Joyce <john@acryl.io>
2022-12-19 18:12:56 -05:00
Harshal Sheth
e9d50ed992
refactor(ingest/stateful): remove IngestionJobStateProvider (#6792) 2022-12-19 17:03:54 -05:00
Monica Senapati
5c366205f5
fix(bigquery-legacy): Fix for TypeError related failures in legacy plugin (#6806)
Co-authored-by: John Joyce <john@acryl.io>
2022-12-19 13:28:25 -08:00
Harshal Sheth
47be95689e
refactor(ingest/stateful): remove most remaining state classes (#6791) 2022-12-19 13:40:48 -05:00
Harshal Sheth
14a00f4098
chore(ingest): pin black version (#6807) 2022-12-19 19:35:49 +01:00
Tamas Nemeth
e41b455e14
fix(ingest): bigquery - sharded table support improvements (#6789) 2022-12-19 18:57:37 +01:00
Harshal Sheth
54e04ba436
fix(ingest/dbt): remove unsupported usage indicator (#6805) 2022-12-19 09:34:49 -08:00
Mayuri Nehate
9716a49067
fix(ingest): correct external url for account identifier with account name (#6715) 2022-12-16 14:00:42 -05:00
Harshal Sheth
22081f5ecc
feat(ingest): lookml - add unreachable views to report (#6779) 2022-12-15 20:26:30 -08:00
Harshal Sheth
8a537b0559
feat(ingest): add datahub state inspect command (#6763) 2022-12-15 18:55:36 -05:00
Harshal Sheth
798d82fe60
docs(ingest): fix error in custom tags transformer example (#6767) 2022-12-15 15:31:12 -08:00
Tamas Nemeth
b7bc1e9116
fix(ingest): bigquery - handling custom sql errors as warning (#6777) 2022-12-15 23:40:32 +01:00
Harshal Sheth
6152b5e9f7
feat(ingest): simplify more stateful ingestion state (#6762) 2022-12-15 11:33:29 -05:00
Shirshanka Das
db182e4639
fix(python-sdk): DataHubGraph get_aspect should accept empty responses (#6760) 2022-12-14 10:40:16 -08:00
Harshal Sheth
2f95719dba
feat(ingest): remove source config from DatahubIngestionCheckpoint (#6722) 2022-12-14 12:39:21 -05:00
Patrick Franco Braz
f0a371941e
refactor(ingest): bigquery-lineage - allow tables and datasets in uppercase (#6739) 2022-12-14 14:58:03 +01:00
Harshal Sheth
68fd802881
fix(ingest/lookml): fix directory handling and a github_info resolution bug (#6751) 2022-12-14 14:55:38 +01:00
cccs-seb
3c2982c02c
fix(ingest): support airflow mapped operators (#6738) 2022-12-13 22:31:53 -05:00
Harshal Sheth
cf3db168ac
feat(ingest): start simplifying stateful ingestion state (#6740) 2022-12-13 10:05:57 +01:00
Harshal Sheth
7d63399d00
fix(ingest): fix serde for empty dicts in unions with null (#6745)
The code changes in https://github.com/acryldata/avro_gen/pull/16, but tests are written here.
2022-12-13 08:17:24 +01:00
Dmitry Bryazgin
551ef1b335
feat(ingest): add stateful ingestion to the ldap source (#6127)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-12-13 01:13:39 -05:00
Harshal Sheth
85bb1f5030
test(ingest): make hive/trino test more reliable (#6741) 2022-12-12 21:02:52 -05:00
Tamas Nemeth
5658fd5a54
feat(ingest): bigquery - external url support and a small profiling filter fix (#6714) 2022-12-12 16:25:32 -08:00
cccs-Dustin
2cc64742e0
feat(ingest/iceberg): add stateful ingestion (#6344) 2022-12-12 13:06:03 -05:00
Mayuri Nehate
65ba13d9aa
feat(ingest): snowflake - add separate config for include_column_lineage in snowflake (#6712) 2022-12-12 15:23:12 +01:00
Jan Hicken
d3fca44e16
fix(ingest): bigquery - rectify filter for BigQuery external tables (#6691) 2022-12-12 10:58:23 +01:00
Harshal Sheth
fd911c9820
feat(ingest): redact configs reported in ingestion_run_summary (#6696) 2022-12-12 10:48:26 +01:00
Mayuri Nehate
5c99f20b7d
fix(ingest): mysql - fix mysql ingestion issue with non-lowercase database (#6713) 2022-12-12 10:48:01 +01:00
Harshal Sheth
b7735d5b21
fix(ingest): fix bug in auto_status_aspect (#6705)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2022-12-09 12:24:39 -05:00
Harshal Sheth
c211cfbbe6
fix(ingest/sagemaker): handle missing ProcessingInputs field (#6697)
Fixes #6360.
2022-12-08 18:42:28 -08:00