Sergio Gómez Villamor
1c5b7c18fc
chore(ingestion): removes ignore for SIM117 ruff rule ( #13295 )
2025-04-23 15:55:46 +02:00
Aseem Bansal
2ecd3bbab8
feat(cli): add extra pip and debug flag ( #12621 )
2025-02-17 16:03:07 +05:30
Sergio Gómez Villamor
4b79e7525f
fix(ingestion): groupby_unsorted ( #12403 )
2025-01-21 09:44:19 +01:00
Aseem Bansal
262dd76518
dev: remove black in favor of ruff for formatting ( #12378 )
2025-01-18 15:06:20 +05:30
Aseem Bansal
2226820ad1
dev(ingest): use ruff instead of flake8 ( #12359 )
2025-01-16 08:19:07 +05:30
Andrew Sikowitz
92f013e6e1
fix(ingest/file-backed-collections): Properly set _use_sqlite_on_conflict ( #12297 )
2025-01-08 11:40:02 -08:00
Andrew Sikowitz
6b8d21a2ab
feat(ingest/sqlite): Support sqlite < 3.24.0 ( #12137 )
2024-12-16 12:50:25 -08:00
Harshal Sheth
93c8ae2267
fix(ingest/snowflake): handle dots in snowflake table names ( #12105 )
2024-12-12 15:31:32 +05:30
Harshal Sheth
d953718ab7
feat(ingest): allow max_workers=1 with ASYNC_BATCH rest sink ( #12088 )
2024-12-10 18:32:52 -05:00
sagar-salvi-apptware
57b12bd9cb
fix(ingest): replace sqllineage/sqlparse with our SQL parser ( #12020 )
2024-12-10 08:36:01 -08:00
Harshal Sheth
42bb07a35e
fix(ingest/bigquery): increase logging in bigquery-queries extractor ( #11774 )
2024-11-20 13:35:01 -08:00
Harshal Sheth
3b415cde69
refactor(ingest/snowflake): move oauth config into snowflake dir ( #11888 )
2024-11-20 13:34:47 -08:00
Andrew Sikowitz
94f1f39667
fix(ingest/partitionExecutor): Fetch ready items for non-empty batch when _pending is empty ( #11885 )
2024-11-18 17:25:43 -08:00
Andrew Sikowitz
5ff6295b0f
fix(ingest/partition-executor): Fix deadlock by recomputing ready items ( #11853 )
2024-11-14 08:48:30 +01:00
Harshal Sheth
e609ff810d
feat(ingest/powerbi): improve reporting around m-query parser ( #11763 )
2024-10-31 16:27:45 -07:00
Harshal Sheth
143fc011fa
feat(ingest/powerbi): add timeouts for m-query parsing ( #11753 )
2024-10-30 19:40:45 +01:00
Harshal Sheth
35f30b7d3c
feat(ingest): use mainline sqlglot ( #11693 )
2024-10-22 19:57:46 -07:00
Shirshanka Das
3b1b76244d
feat(sdk):platform-resource - complex queries ( #11675 )
2024-10-19 14:53:28 -07:00
Harshal Sheth
b8144699fd
chore(ingest): reorganize unit tests ( #11636 )
2024-10-16 19:18:32 -07:00
Harshal Sheth
38bcd9c381
feat(ingest): default to ASYNC_BATCH mode in datahub-rest sink ( #11369 )
2024-09-17 07:11:58 +01:00
Harshal Sheth
311ea10833
feat(ingest): maintain ordering in file-backed dict ( #11346 )
2024-09-10 13:53:38 -07:00
Felix Lüdin
9619553e2d
fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect ( #10898 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2024-08-06 12:52:20 -07:00
sid-acryl
dffdef2eaa
fix(ingestion/powerbi): fix issue with broken report lineage ( #10910 )
2024-07-31 11:40:09 -07:00
Mayuri Nehate
ff1c6b895e
feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction ( #10884 )
2024-07-16 11:46:42 -07:00
Harshal Sheth
0d677e4992
fix(ingest/snowflake): fix column batcher ( #10781 )
2024-06-25 22:21:54 -07:00
Harshal Sheth
724907b8f4
feat(ingest): add async batch mode to the rest sink ( #10733 )
2024-06-25 15:49:00 -07:00
Harshal Sheth
0dc0bc5761
feat(ingest/snowflake): performance improvements ( #10746 )
2024-06-25 14:46:55 -07:00
Harshal Sheth
3d5735cbc5
chore(ingest): run pyupgrade for python 3.8 ( #10513 )
2024-05-15 22:31:05 -07:00
siladitya
43ac405c57
fix(metadata-ingestion)glue connector failure when Optional field Type of PartitionKey is absent for a Table ( #10052 )
2024-03-20 11:02:28 +01:00
Harshal Sheth
92a3ac6f11
fix(ingest): use contextvar for cooperative timeout ( #10021 )
2024-03-11 14:14:39 -07:00
Harshal Sheth
0d780e5f8f
feat(ingest): sql parsing aggregator ( #9786 )
2024-02-09 16:27:45 -05:00
Harshal Sheth
98e3da42f5
feat(ingest/looker): add backpressure-aware executor ( #9615 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2024-01-12 18:54:08 +01:00
Harshal Sheth
f05056aed7
feat(ingest): key-partitioning for rest emitter ( #9613 )
2024-01-11 17:57:48 -05:00
Harshal Sheth
7517c77ffd
fix(ingest): resolve issue with caplog and asyncio ( #9377 )
2023-12-04 20:00:11 -05:00
Andrew Sikowitz
adf8c8db38
refactor(ingest): Move sqlalchemy import out of sql_types.py ( #9065 )
2023-10-24 08:59:56 +02:00
Tim
1eaf9c8c5f
feature(ingest/athena): introduce support for complex and nested schemas in Athena ( #8137 )
...
Co-authored-by: dnks23 <dominik.s23@live.de>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Tim <tim@MBP-von-Tim.fritz.box>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-10-18 09:39:59 -07:00
Mayuri Nehate
c81a339bfc
build(ingest): remove ratelimiter dependency ( #9008 )
2023-10-16 09:27:57 -07:00
Mayuri Nehate
cdb9f5ba62
feat(bigquery): add better timers around every API call ( #8626 )
2023-09-15 11:55:39 -07:00
Tamas Nemeth
1a47a51f1b
fix(ingest/build): Fix sagemaker mypy and flake8 issues ( #8530 )
2023-07-31 16:13:07 +02:00
Harshal Sheth
08d4e904a8
feat(ingest): add YamlFileUpdater utility ( #8266 )
2023-06-29 13:15:34 -07:00
Harshal Sheth
e99875cac6
chore(ingest): enable flake8 bugbear linting ( #7763 )
2023-04-10 14:14:42 -07:00
Andrew Sikowitz
ce1ac7fa12
refactor(ingest): Use sqlite.Row row_factory for FileBackedCollections ( #7739 )
2023-04-04 11:53:56 -07:00
Andrew Sikowitz
c7d35ffd66
perf(ingest): Improve FileBackedDict iteration performance; minor refactoring ( #7689 )
...
- Adds dirty bit to cache, only writes data if dirty
- Refactors __iter__
- Adds sql_query_iterator
- Adds items_snapshot, more performant `items()` that allows for filtering
- Renames connection -> shared_connection
- Removes unnecessary flush during close if connection is not shared
- Adds Closeable mixin
2023-03-27 17:20:34 -04:00
Andrew Sikowitz
8dd7a85533
refactor(ingest): Use shared connection wrapper over connection cache ( #7570 )
2023-03-14 15:09:37 -07:00
Harshal Sheth
fbfe43b1cb
feat(ingest): fix edge cases + interface cleanup for file-system APIs ( #7533 )
2023-03-13 13:14:53 -07:00
Harshal Sheth
b82afa89f1
feat(ingest): enable joins across FileBackedDicts + add FileBackedList ( #7506 )
2023-03-09 15:22:03 -08:00
Andrew Sikowitz
8101f0d47a
feat(ingest): Introduce FileBackedDict for offloading data to disk ( #7461 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Also includes minor refactoring to the bigquery connector
2023-03-01 19:09:51 -05:00
Tamas Nemeth
9015a43f25
fix(ingest): bigquery-beta - Adding python 3.8 fix for memory footprint util ( #6228 )
2022-10-18 17:59:31 -07:00
Tamas Nemeth
2f79b50c24
fix(ingest): presto-on-hive - not failing on Hive type parsing error ( #6118 )
...
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-10-04 20:54:38 -07:00
Mayuri Nehate
b195b6c123
fix(ingest): encode reserved characters when creating dataset urn ( #5977 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-09-20 16:59:02 -07:00