34 Commits

Author SHA1 Message Date
Harshal Sheth
f3b17d4d4c
feat(ingest): improve error messages for unknown metadata objects (#12745) 2025-02-28 12:36:58 -08:00
Harshal Sheth
407b188759
fix(ingest): prevent interleaved writes in dead letter queue (#12688) 2025-02-21 16:10:36 -08:00
Mayuri Nehate
84c677629d
feat(ingest): add stateful ingestion support for file source (#11804) 2024-11-08 16:11:30 +05:30
Oleksandr Simonchuk
8b4e302881
feat(ingest): add and use file system abstraction in file source (#8415)
Co-authored-by: oleksandrsimonchuk <oleksandr.si@appsflyer.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-07-01 10:47:07 -07:00
Andrew Sikowitz
46dbb10940
docs(ingest): Rename csv / s3 / file source and sink (#10675) 2024-06-11 11:44:13 -07:00
Harshal Sheth
3d5735cbc5
chore(ingest): run pyupgrade for python 3.8 (#10513) 2024-05-15 22:31:05 -07:00
Harshal Sheth
3cede10ab3
feat(ingest/dbt): support use_compiled_code and test_warnings_are_errors (#8956) 2023-10-05 10:29:47 -07:00
Harshal Sheth
fd9121737d
fix(ingest/file): remove entity_type_counts and aspect_counts (#8586) 2023-08-09 13:01:12 -04:00
Andrew Sikowitz
fdbc4de695
refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source (#8101) 2023-05-24 13:36:19 -07:00
Andrew Sikowitz
2e1c3981aa
refactor(ingest): Move source_helpers.py from datahub/utilities -> datahub/api (#8052) 2023-05-17 20:51:06 -07:00
Andrew Sikowitz
7ba2d13087
refactor(ingest): Make get_workunits() return MetadataWorkUnits (#8051)
- Deprecates UsageAggregationClass, /usageStats?action=batchIngest, UsageStatsWorkUnit
- Removes parsing of UsageAggregationClass in file source, all sinks, and WorkUnitRecordExtractor
2023-05-17 00:01:57 -04:00
xiphl
af09034523
[bugfix] Fix remote file ingestion for Windows (#7888)
Co-authored-by: Shirshanka Das <shirshanka+github@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-04-27 10:28:10 -07:00
Andrew Sikowitz
de587b2bfe
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources (#7718)
- Adds auto_workunit_reporter to each source
- Standardizes comments around remote paths
- Adds back AuditStamp to FileLineage source
- Some generic refactoring
2023-03-31 15:49:24 -07:00
xiphl
7d240c600a
feat(ingestion) Allow for ingestion to read files remotely (#7552)
Co-authored-by: xiphl <xiphlerl9@gmail.com>
Allows the CsvEnricher, BusinessGlossary, File, and LineageFile sources to read from URLs.
2023-03-29 18:10:46 -07:00
Harshal Sheth
667ca8632d
feat(ingest): avoid embedding serialized json in metadata files (#6742) 2022-12-28 19:28:38 -05:00
Aseem Bansal
43c566ee4f
feat(ingest): add dummy data source for automated testing (#6550) 2022-12-06 16:57:12 +05:30
Harshal Sheth
817406eadb
refactor(ingest): simplify stateful ingestion config (#6454)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2022-11-18 00:09:24 -05:00
Shirshanka Das
3106e42e89
fix(ingest): file - add configurability for counting all elements before starting ingestion (#6136) 2022-10-06 10:24:23 -07:00
Harshal Sheth
f227bd982b
refactor(ingest): streamline pydantic configs (#6011) 2022-09-25 23:37:48 -07:00
Harshal Sheth
e23523a781
fix(ingest): fix type annotations on some pydantic fields (#5795) 2022-09-14 11:05:31 -07:00
Harshal Sheth
a1e1d2fd0a
feat(ingest): add ConfigEnum type (#5734) 2022-09-14 09:57:42 -07:00
Shirshanka Das
84b279a933
feat(ingest):looker - reduce mem usage, misc reporting improvements (#5823) 2022-09-04 15:43:57 -07:00
Aseem Bansal
83d15dd86e
feat(ingest): file - allow filter by aspect and get stats (#5738) 2022-08-26 18:54:26 +05:30
Aseem Bansal
3da9941521
feat(ingest): round reported time to 2 decimal places (#5721) 2022-08-24 23:08:33 -07:00
Shirshanka Das
505cefef13
feat(ingest): better reporting for file source, friendlier stats names (#5710) 2022-08-23 10:21:24 -07:00
Shirshanka Das
647de906f2
feat(ingest): rest-sink - stability improvements to handle large inpu… (#5693) 2022-08-21 12:30:19 -07:00
Shirshanka Das
bb788ac317
feat(ingest): file - add support for folders, large files, improve co… (#5692) 2022-08-21 14:18:22 +05:30
Shirshanka Das
7ed9cd2838
feat(ingest): snowflake - basic test connection capability (#5464) 2022-07-22 09:14:37 +02:00
Shirshanka Das
76133c6f37
feat(ingest): add test source connection feature, structured report file (#5442) 2022-07-19 20:40:59 -07:00
Shirshanka Das
a9ad138172
feat(ingest): docs - overhaul source connector docs to make it code driven (#4798)
Co-authored-by: MugdhaHardikar-GSLab <mugdha.hardikar@gslab.com>
2022-05-02 00:18:15 -07:00
John Joyce
352a0abf8d
Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect (#2983)
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-07-30 17:41:03 -07:00
Harshal Sheth
7ab6355b1c
feat(ingest): stricter deserialization for MCE JSONs (#2976) 2021-07-28 14:50:21 -07:00
Harshal Sheth
79f60d8b8a
refactor(ingest): remove deprecated methods and warn on deprecated import (#2797) 2021-06-29 11:43:43 -07:00
Harshal Sheth
937f02c6bc
feat: usage stats (part 1) (#2750)
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
2021-06-24 17:11:00 -07:00