24 Commits

Author SHA1 Message Date
Sergio Gómez Villamor
8cae980286
tests(ingestion): moving some tests so they are available for sdk users (#13540) 2025-05-19 08:39:53 +02:00
sid-acryl
9fb2df11f3
fix(ingest): sort by last modified not working in the UI (#11343) 2024-09-23 10:06:05 -07:00
Harshal Sheth
3755731f0e
chore(ingest): improve code formatting (#11326) 2024-09-11 10:48:57 -07:00
Andrew Sikowitz
fa1164aa63
feat(ingest/s3): Support reading S3 file type (#11177)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2024-08-30 12:15:12 +02:00
Tamas Nemeth
ef6a410091
feat(ingest/s3): Partition support improvements (#11083)
- Partition autodetection
- Option to find min/max/min-max partition of a dataset
- Generating Partition aspects
2024-08-22 17:55:43 +02:00
Harshal Sheth
05930560cc
feat(ingest/s3): set default spark version (#10057) 2024-03-18 14:27:01 -07:00
Tamas Nemeth
d86b336e70
chore(ingest/s3) Bump Deequ and Pyspark version (#8638)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-29 18:11:37 +02:00
Tamas Nemeth
a91c78cf31
fix(ingest/s3): fix test flakiness (#8416) 2023-07-14 00:42:00 +02:00
Tamas Nemeth
54c7aef1bc
feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore (#8348)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-12 15:56:13 -03:00
Tamas Nemeth
74ab1bea06
fix(ingest/s3): Fix for flaky s3 test - uploading s3 files in consistent order (#8367) 2023-07-04 19:19:39 +02:00
Tamas Nemeth
bdd4bc7b92
feat(ingest/s3) - Stateful ingestion and last-updated support (#8022) 2023-05-19 13:10:15 +02:00
Harshal Sheth
e99875cac6
chore(ingest): enable flake8 bugbear linting (#7763) 2023-04-10 14:14:42 -07:00
Harsha Mandadi
bf36c935fa
feat(ingest/s3): support path_specs of different S3 buckets in the same recipe (#7514) 2023-03-14 21:55:57 -07:00
Shirshanka Das
26cf0a71ab
fix(test): suppress s3 golden file test for specific paths (#7551) 2023-03-12 10:43:02 -07:00
Harshal Sheth
49029943f9
fix(ingest): remove extraneous platform configs (#7454) 2023-03-02 01:10:35 -08:00
Aseem Bansal
372f673aef
chore(ci): mark tests correctly (#7337) 2023-02-15 16:32:53 +05:30
Harshal Sheth
3c0f63c50a
fix(ingest): hide deprecated path_spec option from config (#5944) 2022-10-04 12:14:00 -07:00
Mayuri Nehate
a14617b6a4
fix(ingest): continue validation of s3 path_specs even if platform is set (#5951) 2022-09-16 12:03:57 -07:00
Ravindra Lanka
228f3b50ea
feat(ingestion): send reports of ingestion runs to datahub (#5639)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-08-19 09:08:17 -07:00
Mayuri Nehate
2c48329810
feat(model): dashboard usage model, is_null condition added (#5397) 2022-07-15 15:37:06 +05:30
Aseem Bansal
4541379024
feat(build): changes to decrease build time, cancel runs in case of multiple commits (#5187) 2022-06-17 18:05:10 +05:30
Tamas Nemeth
56ee4d9651
feat(ingest): s3 - add support for multiple pathspecs in one recipe (#4777) 2022-05-05 10:09:47 -07:00
Jordan Wolinsky
bbac4a7a11
feat(ingestion): glue/s3 - Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job (#4689) 2022-04-29 10:09:06 +02:00
MugdhaHardikar-GSLab
37aedfc87c
feat(s3): add s3 source (#4490)
* feat(data-lake): add containers and folder level dataset support

* docs(data-lake): Update readme for data lake

* doc(data-lake): fix examples, update doc

* lint fix

* feat(s3): add s3 source, restore old data-lake source

Co-authored-by: Mayuri N <mayuri.nehate@gslab.com>
2022-03-29 11:52:57 +02:00