Mayur Singal
7760663b22
MINOR: Change ingestion licence header ( #20549 )
2025-04-03 10:39:47 +05:30
Akash Verma
39dcb5baef
Feature : Cockroach db connector ( #18961 )
2025-01-02 13:07:55 +05:30
IceS2
f0049853ec
FIXES 14885: Initial deltalake implementation for s3 ( #16665 )
...
* Initial deltalake implementation for s3
* Fix styles
* Fix test_amundsen
* Fix UnitTests
* Fix Checkstyle
* Fix integration tests due to datalake client refactor
* Fix unit tests
* Fix tests
* Fix Integration DeltaLake Storage test
* Skip delta storage integration test for python 3.8
* DeltaLake JSONSchema changes migrations
* Update import name
* Add some comments based on sonarcloud suggestions
* Update DeltaLake documentation
* Resolve some comments
2024-06-20 12:08:21 +05:30
Pere Miquel Brull
cb72a22b59
Fix - e2e tests for pydantic V2 ( #16551 )
...
* Fix - e2e tests for pydantic V2
* add correct default
* add correct default
* revert datetime aware
* revert datetime aware
* revert datetime aware
* revert datetime aware
* revert datetime aware
* revert datetime aware
* revert datetime aware
* revert datetime aware
* fix apis
* format
2024-06-06 19:36:17 -07:00
Pere Miquel Brull
d8e2187980
#15243 - Pydantic V2 & Airflow 2.9 ( #16480 )
...
* pydantic v2
* pydanticv2
* fix parser
* fix annotated
* fix model dumping
* mysql ingestion
* clean root models
* clean root models
* bump airflow
* bump airflow
* bump airflow
* optionals
* optionals
* optionals
* jdk
* airflow migrate
* fab provider
* fab provider
* fab provider
* some more fixes
* fixing tests and imports
* model_dump and model_validate
* model_dump and model_validate
* model_dump and model_validate
* union
* pylint
* pylint
* integration tests
* fix CostAnalysisReportData
* integration tests
* tests
* missing defaults
* missing defaults
2024-06-05 21:18:37 +02:00
Mayur Singal
6b90c245d4
MINOR: Add support for json schema parsing for datalake & s3 ( #15615 )
2024-03-26 10:03:21 +05:30
IceS2
e7c9d6aa7f
FIXES 15215: Implement initial Multithreading approach for the Metadata Ingestion on Databases ( #15130 )
...
* Implement Initial MultiThread suggestion
* Update all the ingestion sources to use the new ContextManager
* Fix missing wraps on decorator
* Fix Unittests
* Fix linters
* Fix linters
* Fix BigQuery UnitTests
* Add UnitTests to the newly created code
* Fix unittest
* change the threads from table to schemas
* Update README.md
* Small change suggested by Sonar
* Slight change to test a different way to multithread over tables
* Debug changes
* More multithread tests
* Remove uneeded wait time
* Testing
* refactor code based on removal of time.sleep
* Fix wrong paste
* Improve ExecutionTimeContextManager
* Fix missing .get() and unit tests
* Fix conflicting changes
* Update Multithread logic with the incremental extraction
* Fix linters
* Fix unittest
* Remove commented code
* Fix Unittests
* Fix checkstyle
* Change default to threads = 1
2024-03-25 18:20:40 +01:00
Teddy
9a4a9df836
Fix #14895 - Get Metadata from Parquet Schema ( #14956 )
...
* linting: fix python linting
* fix: get column types from parquet schema for parquet files
* style: python linting
* fix: remove displayType check in test as variation depending on OS
2024-02-01 09:02:52 +01:00
Ayush Shah
1552aeb2de
Fix #13149 : Multiple Project Id for Datalake GCS ( #14846 )
...
* Fix Multiple Project Id for datalake gcs
* Optimize logic
* Fix Tests
* Add Datalake GCS Tests
* Add multiple project id gcs test
2024-01-25 10:52:16 +01:00
Onkar Ravgan
ebb2317cf0
Fix 14040: Part 1 Remove get_by_name calls from topology ( #14098 )
...
* Changed for database
* Added changes for dashboard_service
* Changed for messaging service
* Changed for mlmodel service
* Changed for pipeline service
* Changed for search service
* Changed for objectstore service
* fixed wrong import
* fixed lint
* fixes
* fixed pytests
* fixed domo db pytest
* Fixed review comments
2023-11-27 16:15:47 +05:30
Teddy
1cbdfb3ae7
Fixes #12601 - column filter for profiler workflow ( #13535 )
...
* fix: sample data ingestion to match entity profiler column setting
* fix: python linting
* fix: updated fn call
* fix: added logic to handle json filed in datalake connector
* fix: handle NA values in parsing
* fix: reverted sampler changes from #13338
* fix: reverted metric changes from #13338
* fix: added datalake profiler ingestion test
* fix: python linting
* fix: removed normalization of json blob in NoSQL db
2023-10-12 14:51:38 +02:00
Ayush Shah
08d7ee6d55
Fixes #13052 : Datalake Nested Columns Sample Data ingestion ( #13338 )
2023-10-08 20:08:51 +05:30
Ayush Shah
5fea08cd33
Datalake: Add manifest file support, fix profiler metrics, add array and json column type support ( #13017 )
2023-09-13 15:15:49 +05:30
Pere Miquel Brull
e97d4befb1
Fix #12770 - Cleanup DL structure & Readers & Python 3.8 ( #12776 )
2023-08-09 16:07:16 +05:30
Mayur Singal
7fa963eec3
Fix #1076 : Add mongodb support ( #11943 )
2023-06-15 11:14:22 +05:30
Ayush Shah
ad7258e7be
Fixes 10949: return Chunks for file formats & Centralize logic for different auth configs ( #11639 )
...
* Centralize Auth and File formats datalake
2023-05-19 18:54:28 +05:30
Mayur Singal
3d345f9b37
Fix #10273 : Parse nested json for datalake ( #10956 )
2023-04-10 14:58:02 +05:30
Mayur Singal
752163ac71
Fix #10814 : Improve parsing logic for union fields in topic ( #10836 )
2023-04-01 11:10:05 +05:30
Nahuel
07d6028149
Fix: remove avro-python3 deprecated dependency ( #10602 )
2023-03-15 14:15:57 +00:00
Onkar Ravgan
4d11db4220
Added doc in avro array and tests ( #10473 )
2023-03-08 20:16:50 +05:30
Mayur Singal
392107bc4a
Datalake Avro & Json Lines Support ( #10129 )
2023-02-08 17:31:25 +00:00
Pere Miquel Brull
7f21a7bced
Fix #8088 - Restructure source connections & clients ( #9545 )
2023-01-02 13:52:27 +01:00
Pere Miquel Brull
a4521fd664
Fix #6562 - Sources have their own package ( #9521 )
...
Fix #6562 - Sources have their own package (#9521 )
2022-12-27 15:00:22 +01:00
Abhishek Pandey
73b370b5e2
schema-filter-added-in-datalake-for-bucket ( #8516 )
...
Co-authored-by: ulixius9 <mayursingal9@gmail.com>
2022-11-08 10:57:16 +05:30