24 Commits

Author SHA1 Message Date
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Akash Verma
39dcb5baef
Feature : Cockroach db connector (#18961) 2025-01-02 13:07:55 +05:30
IceS2
f0049853ec
FIXES 14885: Initial deltalake implementation for s3 (#16665)
* Initial deltalake implementation for s3

* Fix styles

* Fix test_amundsen

* Fix UnitTests

* Fix Checkstyle

* Fix integration tests due to datalake client refactor

* Fix unit tests

* Fix tests

* Fix Integration DeltaLake Storage test

* Skip delta storage integration test for python 3.8

* DeltaLake JSONSchema changes migrations

* Update import name

* Add some comments based on sonarcloud suggestions

* Update DeltaLake documentation

* Resolve some comments
2024-06-20 12:08:21 +05:30
Pere Miquel Brull
cb72a22b59
Fix - e2e tests for pydantic V2 (#16551)
* Fix - e2e tests for pydantic V2

* add correct default

* add correct default

* revert datetime aware

* revert datetime aware

* revert datetime aware

* revert datetime aware

* revert datetime aware

* revert datetime aware

* revert datetime aware

* revert datetime aware

* fix apis

* format
2024-06-06 19:36:17 -07:00
Pere Miquel Brull
d8e2187980
#15243 - Pydantic V2 & Airflow 2.9 (#16480)
* pydantic v2

* pydanticv2

* fix parser

* fix annotated

* fix model dumping

* mysql ingestion

* clean root models

* clean root models

* bump airflow

* bump airflow

* bump airflow

* optionals

* optionals

* optionals

* jdk

* airflow migrate

* fab provider

* fab provider

* fab provider

* some more fixes

* fixing tests and imports

* model_dump and model_validate

* model_dump and model_validate

* model_dump and model_validate

* union

* pylint

* pylint

* integration tests

* fix CostAnalysisReportData

* integration tests

* tests

* missing defaults

* missing defaults
2024-06-05 21:18:37 +02:00
Mayur Singal
6b90c245d4
MINOR: Add support for json schema parsing for datalake & s3 (#15615) 2024-03-26 10:03:21 +05:30
IceS2
e7c9d6aa7f
FIXES 15215: Implement initial Multithreading approach for the Metadata Ingestion on Databases (#15130)
* Implement Initial MultiThread suggestion

* Update all the ingestion sources to use the new ContextManager

* Fix missing wraps on decorator

* Fix Unittests

* Fix linters

* Fix linters

* Fix BigQuery UnitTests

* Add UnitTests to the newly created code

* Fix unittest

* change the threads from table to schemas

* Update README.md

* Small change suggested by Sonar

* Slight change to test a different way to multithread over tables

* Debug changes

* More multithread tests

* Remove uneeded wait time

* Testing

* refactor code based on removal of time.sleep

* Fix wrong paste

* Improve ExecutionTimeContextManager

* Fix missing .get() and unit tests

* Fix conflicting changes

* Update Multithread logic with the incremental extraction

* Fix linters

* Fix unittest

* Remove commented code

* Fix Unittests

* Fix checkstyle

* Change default to threads = 1
2024-03-25 18:20:40 +01:00
Teddy
9a4a9df836
Fix #14895 - Get Metadata from Parquet Schema (#14956)
* linting: fix python linting

* fix: get column types from parquet schema for parquet files

* style: python linting

* fix: remove displayType check in test as variation depending on OS
2024-02-01 09:02:52 +01:00
Ayush Shah
1552aeb2de
Fix #13149: Multiple Project Id for Datalake GCS (#14846)
* Fix Multiple Project Id for datalake gcs

* Optimize logic

* Fix Tests

* Add Datalake GCS Tests

* Add multiple project id gcs test
2024-01-25 10:52:16 +01:00
Onkar Ravgan
ebb2317cf0
Fix 14040: Part 1 Remove get_by_name calls from topology (#14098)
* Changed for database

* Added changes for dashboard_service

* Changed for messaging service

* Changed for mlmodel service

* Changed for pipeline service

* Changed for search service

* Changed for objectstore service

* fixed wrong import

* fixed lint

* fixes

* fixed pytests

* fixed domo db pytest

* Fixed review comments
2023-11-27 16:15:47 +05:30
Teddy
1cbdfb3ae7
Fixes #12601 - column filter for profiler workflow (#13535)
* fix: sample data ingestion to match entity profiler column setting

* fix: python linting

* fix: updated fn call

* fix: added logic to handle json filed in datalake connector

* fix: handle NA values in parsing

* fix: reverted sampler changes from #13338

* fix: reverted metric changes from #13338

* fix: added datalake profiler ingestion test

* fix: python linting

* fix: removed normalization of json blob in NoSQL db
2023-10-12 14:51:38 +02:00
Ayush Shah
08d7ee6d55
Fixes #13052: Datalake Nested Columns Sample Data ingestion (#13338) 2023-10-08 20:08:51 +05:30
Ayush Shah
5fea08cd33
Datalake: Add manifest file support, fix profiler metrics, add array and json column type support (#13017) 2023-09-13 15:15:49 +05:30
Pere Miquel Brull
e97d4befb1
Fix #12770 - Cleanup DL structure & Readers & Python 3.8 (#12776) 2023-08-09 16:07:16 +05:30
Mayur Singal
7fa963eec3
Fix #1076: Add mongodb support (#11943) 2023-06-15 11:14:22 +05:30
Ayush Shah
ad7258e7be
Fixes 10949: return Chunks for file formats & Centralize logic for different auth configs (#11639)
* Centralize Auth and File formats datalake
2023-05-19 18:54:28 +05:30
Mayur Singal
3d345f9b37
Fix #10273: Parse nested json for datalake (#10956) 2023-04-10 14:58:02 +05:30
Mayur Singal
752163ac71
Fix #10814: Improve parsing logic for union fields in topic (#10836) 2023-04-01 11:10:05 +05:30
Nahuel
07d6028149
Fix: remove avro-python3 deprecated dependency (#10602) 2023-03-15 14:15:57 +00:00
Onkar Ravgan
4d11db4220
Added doc in avro array and tests (#10473) 2023-03-08 20:16:50 +05:30
Mayur Singal
392107bc4a
Datalake Avro & Json Lines Support (#10129) 2023-02-08 17:31:25 +00:00
Pere Miquel Brull
7f21a7bced
Fix #8088 - Restructure source connections & clients (#9545) 2023-01-02 13:52:27 +01:00
Pere Miquel Brull
a4521fd664
Fix #6562 - Sources have their own package (#9521)
Fix #6562 - Sources have their own package (#9521)
2022-12-27 15:00:22 +01:00
Abhishek Pandey
73b370b5e2
schema-filter-added-in-datalake-for-bucket (#8516)
Co-authored-by: ulixius9 <mayursingal9@gmail.com>
2022-11-08 10:57:16 +05:30