40 Commits

Author SHA1 Message Date
Sergio Gómez Villamor
1563b0e9fb
fix(ingestion): use default generate_browse_path_v2 even if no pipeline_config (#13117) 2025-04-23 13:25:58 +02:00
skrydal
38f1553315
feat(ingestion): Refactoring timestamping logic for WorkUnits + custom logic for Iceberg (#13030)
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
2025-04-04 22:30:27 +02:00
Andrew Sikowitz
756b199506
fix(ingest/glue): Add additional checks and logging when specifying catalog_id (#12168) 2024-12-24 14:56:35 -08:00
Harshal Sheth
7dbb3e60cb
chore(ingest): start using explicit exports (#11899) 2024-11-20 13:33:30 -08:00
Julien Jehannet
326afc6308
fix(ingestion/glue): manage table names from resource_links from nearest catalog correctly (#11578) 2024-10-23 11:39:23 +05:30
Harshal Sheth
b8144699fd
chore(ingest): reorganize unit tests (#11636) 2024-10-16 19:18:32 -07:00
Sergio Gómez Villamor
31edb46dbc
feat(ingestion): adds env property in ContainerProperties (#11214)
Co-authored-by: siladitya2 <siladitya2@gmail.com>
2024-09-18 14:56:52 +05:30
sagar-salvi-apptware
a09575fb6f
fix(ingestion/glue): Add support for missing config options for profiling in Glue (#10858) 2024-07-29 16:04:07 +05:30
sagar-salvi-apptware
348d449d8a
fix(ingest/Glue): column upstream lineage between S3 and Glue (#10895) 2024-07-19 14:39:19 +05:30
sagar-salvi-apptware
b8af2b9d69
fix(ingestion/glue): ensure date formatting works on all platforms for aws glue (#10836) 2024-07-03 18:05:37 +05:30
skrydal
099021c7a3
feat(ingest/glue): allow ingestion of empty databases from Glue (#10666)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2024-07-03 11:43:12 +05:30
Harshal Sheth
f4be88d0a9
feat(ingest): set pipeline name in system metadata (#10190)
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
2024-06-27 15:00:35 -07:00
skrydal
b9e71a61b1
feat(ingest/glue): database parameters extraction (#10665) 2024-06-11 11:50:46 -07:00
Sergio Gómez Villamor
0059960720
feat(ingestion/glue): delta schemas (#10299)
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
2024-05-17 14:21:35 +02:00
dushayntAW
3c7c3ec904
fix(ingestion/glue): fix to ingest the comment for partition key as description (#10189) 2024-04-03 17:34:02 +05:30
Andrew Sikowitz
3a21c27f06
feat(ingest): Turn on browse path v2 creation (#8342)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-07-06 16:43:42 -04:00
Andrew Sikowitz
fdbc4de695
refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source (#8101) 2023-05-24 13:36:19 -07:00
Harshal Sheth
4e9c652707
feat(ingest): add env to container properties (#8027) 2023-05-22 12:07:16 -07:00
Shirshanka Das
17e85979dd
refactor(ingest): subtypes - standardize (#7437) 2023-02-28 13:11:07 -08:00
Daniel Messias
0d67e188ef
feat(glue): Use table name as human-readable name for Glue ingestion (#7213)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-02 18:04:35 +01:00
Harshal Sheth
db1a0f13f3
fix(ingest): fix issue in glue tests (#7185) 2023-01-30 21:51:21 -08:00
Harshal Sheth
09616ee2b3
feat(ingest): include instance in container dataPlatform when provided (#6083) 2022-10-13 11:29:54 -07:00
Shirshanka Das
e9c4c823d8
fix(ingest): bigquery-beta - ensure that status aspect is emitted for… (#6154)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-10-08 16:00:45 -07:00
skrydal
a026c84691
feat: qualifiedName support + populating glue ARN (#5952) 2022-09-15 21:15:03 -07:00
skrydal
f61a040555
feat(ingestion) Add more info to glue entities (#5874)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2022-09-14 12:25:09 -07:00
Amanda Hernando
337087cac0
feat(ingest): glue - add stateful ingestion (#5553) 2022-08-15 20:50:45 -07:00
Ravindra Lanka
108b492ed1
feat(ingestion): Add Iceberg source (#5010)
Co-authored-by: cccs-eric <eric.ladouceur@cyber.gc.ca>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2022-05-26 08:05:57 -07:00
Jordan Wolinsky
bbac4a7a11
feat(ingestion): glue/s3 - Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job (#4689) 2022-04-29 10:09:06 +02:00
Sergio Gómez Villamor
bdf17f551e
feat(ingest): glue - adds platform instance capability (#4130) 2022-03-30 18:50:26 -07:00
Tamas Nemeth
b2664916e3
feat(ingest): Glue - Support for domains and containers (#4110)
* Add container and domain support for Glue.
Adding option to set aws profile for Glue.

* Adding domain doc for Glue

* Making get_workunits less complex

* Updating golden file

* Addressing pr review comments

* Remove unneded empty line
2022-02-16 08:29:14 -08:00
Ravindra Lanka
1efe04f88a
feat(ingest): glue - support for nested structs (#3895) 2022-01-17 14:21:53 -08:00
Kevin Hu
de41134a33
fix(ingestion): fix incorrect glue job names (#3503) 2021-11-02 22:54:47 -07:00
Gabe Lyons
ff527f4bed
feat(foreign keys): add foreign key models (#3275) 2021-09-22 10:29:27 -07:00
aseembansal-gogo
903348e1db
fix(ingest): add missing partition keys in schema for glue sources (#3238) 2021-09-15 21:39:49 -07:00
rslanka
8844240328
feat: Adding support for nested schemas in ingestion and visualization (#3079) 2021-08-11 15:47:18 -07:00
Gabe Lyons
aa253f5b3b
feat(deletes): add run commands (list, show, rollback) to datahub ingest (#2960) 2021-07-29 20:04:40 -07:00
Kevin Hu
736249f0c7
feat(ingest): extract SageMaker metrics, hyperparameters, and external URLs (#2910) 2021-07-21 21:30:07 -07:00
Kevin Hu
a2106ca9e8
feat(ingest): SageMaker jobs and models (#2830) 2021-07-08 16:16:16 -07:00
Harshal Sheth
2f921d15e8
fix(ingest): avoid setting timestamps unless source system provides it (#2843) 2021-07-08 12:11:06 -07:00
Kevin Hu
a89094da5b
feat(ingest): add support for Glue ETL jobs (#2687) 2021-06-22 11:33:22 -07:00