44 Commits

Author SHA1 Message Date
Sergio Gómez Villamor
8cae980286
tests(ingestion): moving some tests so they are available for sdk users (#13540) 2025-05-19 08:39:53 +02:00
Jonny Dixon
132ff7081f
feat(ingestion/s3): Add externalUrls for datasets in s3 and gcs (#12763) 2025-05-17 17:03:40 +01:00
Austin SeungJun Park
41895fe24f
feat(ingest/s3): add table filtering (#12661) 2025-03-20 07:57:43 +01:00
sid-acryl
9fb2df11f3
fix(ingest): sort by last modified not working in the UI (#11343) 2024-09-23 10:06:05 -07:00
Sergio Gómez Villamor
31edb46dbc
feat(ingestion): adds env property in ContainerProperties (#11214)
Co-authored-by: siladitya2 <siladitya2@gmail.com>
2024-09-18 14:56:52 +05:30
Harshal Sheth
3755731f0e
chore(ingest): improve code formatting (#11326) 2024-09-11 10:48:57 -07:00
Andrew Sikowitz
fa1164aa63
feat(ingest/s3): Support reading S3 file type (#11177)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2024-08-30 12:15:12 +02:00
Tamas Nemeth
ef6a410091
feat(ingest/s3): Partition support improvements (#11083)
- Partition autodetection
- Option to find min/max/min-max partition of a dataset
- Generating Partition aspects
2024-08-22 17:55:43 +02:00
Tamas Nemeth
7e5610f358
feat(ingest/dagster): Dagster source (#10071)
Co-authored-by: shubhamjagtap639 <shubham.jagtap@gslab.com>
2024-03-25 13:28:35 +01:00
Harshal Sheth
05930560cc
feat(ingest/s3): set default spark version (#10057) 2024-03-18 14:27:01 -07:00
Harshal Sheth
b0163c4885
feat(ingest): utilities for query logs (#10036) 2024-03-12 23:20:46 -07:00
Tamas Nemeth
d86b336e70
chore(ingest/s3) Bump Deequ and Pyspark version (#8638)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-08-29 18:11:37 +02:00
Jinlin Yang
6748aecdc0
fix(ingest/s3): emit data_platform_instance aspect if the config has platform_instance (#8585) 2023-08-17 10:40:54 +05:30
Andrew Sikowitz
bf9f380350
fix(ingest): Generate browse paths v2 for more sources; properly pass platform_instance (#8501) 2023-07-25 11:35:34 +05:30
Tamas Nemeth
a91c78cf31
fix(ingest/s3): fix test flakiness (#8416) 2023-07-14 00:42:00 +02:00
Tamas Nemeth
54c7aef1bc
feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore (#8348)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-12 15:56:13 -03:00
Andrew Sikowitz
3a21c27f06
feat(ingest): Turn on browse path v2 creation (#8342)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-06 16:43:42 -04:00
Tamas Nemeth
74ab1bea06
fix(ingest/s3): Fix for flaky s3 test - uploading s3 files in consistent order (#8367) 2023-07-04 19:19:39 +02:00
Tamas Nemeth
d50a99935b
fix(ingest/s3): Path spec aware folder traversal (#8095) 2023-05-30 16:20:49 +02:00
Harshal Sheth
4e9c652707
feat(ingest): add env to container properties (#8027) 2023-05-22 12:07:16 -07:00
Tamas Nemeth
bdd4bc7b92
feat(ingest/s3) - Stateful ingestion and last-updated support (#8022) 2023-05-19 13:10:15 +02:00
Tamas Nemeth
dec54bf098
feat(ingest/s3): Inferring schema from the alphabetically last folder (#8005) 2023-05-10 21:55:05 +02:00
Harshal Sheth
e99875cac6
chore(ingest): enable flake8 bugbear linting (#7763) 2023-04-10 14:14:42 -07:00
Harsha Mandadi
bf36c935fa
feat(ingest/s3): support path_specs of different S3 buckets in the same recipe (#7514) 2023-03-14 21:55:57 -07:00
John Joyce
18f387c6ea
fix(cli): Adding exit code to correctly return failure or success (#7520)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
2023-03-13 13:32:40 -07:00
Shirshanka Das
26cf0a71ab
fix(test): suppress s3 golden file test for specific paths (#7551) 2023-03-12 10:43:02 -07:00
Harshal Sheth
49029943f9
fix(ingest): remove extraneous platform configs (#7454) 2023-03-02 01:10:35 -08:00
nachiket-juneja
e07cd2090b
Feat/s3 ingestion enhancement to update schema from latest partition (#7410)
Co-authored-by: Prashant Singh Thakur <prashant.thakur@nucleusteq.com>
2023-02-28 08:58:28 +01:00
Aseem Bansal
372f673aef
chore(ci): mark tests correctly (#7337) 2023-02-15 16:32:53 +05:30
Mayuri Nehate
e79b4e8c2b
feat(ingest): s3 - add status aspect for detected s3 datasets (#6402) 2022-11-13 17:29:42 -08:00
Harshal Sheth
09616ee2b3
feat(ingest): include instance in container dataPlatform when provided (#6083) 2022-10-13 11:29:54 -07:00
Harshal Sheth
e70c0ac4b6
feat(ingest): include raw s3 paths if s3 source (#6168) 2022-10-11 15:55:00 -07:00
Shirshanka Das
e9c4c823d8
fix(ingest): bigquery-beta - ensure that status aspect is emitted for… (#6154)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-10-08 16:00:45 -07:00
Harshal Sheth
3c0f63c50a
fix(ingest): hide deprecated path_spec option from config (#5944) 2022-10-04 12:14:00 -07:00
Mayuri Nehate
a14617b6a4
fix(ingest): continue validation of s3 path_specs even if platform is set (#5951) 2022-09-16 12:03:57 -07:00
Ravindra Lanka
228f3b50ea
feat(ingestion): send reports of ingestion runs to datahub (#5639)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-08-19 09:08:17 -07:00
Jordan Wolinsky
3a86ff3485
Fix profiling when using {table}. (#5531)
* profiling fix for when using {table}

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
2022-08-08 13:16:59 -07:00
Mayuri Nehate
2c48329810
feat(model): dashboard usage model, is_null condition added (#5397) 2022-07-15 15:37:06 +05:30
Aseem Bansal
4541379024
feat(build): changes to decrease build time, cancel runs in case of multiple commits (#5187) 2022-06-17 18:05:10 +05:30
Tamas Nemeth
be91e2341f
feat(ingest): s3 - speeding up ingestion with sampling (#4927) 2022-05-24 22:17:10 -07:00
Tamas Nemeth
56ee4d9651
feat(ingest): s3 - add support for multiple pathspecs in one recipe (#4777) 2022-05-05 10:09:47 -07:00
mayurinehate
c34a1ba735
fix(s3): improved handling for corner cases (#4774) 2022-04-29 12:25:41 -07:00
Jordan Wolinsky
bbac4a7a11
feat(ingestion): glue/s3 - Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job (#4689) 2022-04-29 10:09:06 +02:00
MugdhaHardikar-GSLab
37aedfc87c
feat(s3): add s3 source (#4490)
* feat(data-lake): add containers and folder level dataset support

* docs(data-lake): Update readme for data lake

* doc(data-lake): fix examples, update doc

* lint fix

* feat(s3): add s3 source, restore old data-lake source

Co-authored-by: Mayuri N <mayuri.nehate@gslab.com>
2022-03-29 11:52:57 +02:00