Harshal Sheth
f3b17d4d4c
feat(ingest): improve error messages for unknown metadata objects ( #12745 )
2025-02-28 12:36:58 -08:00
Harshal Sheth
407b188759
fix(ingest): prevent interleaved writes in dead letter queue ( #12688 )
2025-02-21 16:10:36 -08:00
Mayuri Nehate
84c677629d
feat(ingest): add stateful ingestion support for file source ( #11804 )
2024-11-08 16:11:30 +05:30
Oleksandr Simonchuk
8b4e302881
feat(ingest): add and use file system abstraction in file source ( #8415 )
...
Co-authored-by: oleksandrsimonchuk <oleksandr.si@appsflyer.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-07-01 10:47:07 -07:00
Andrew Sikowitz
46dbb10940
docs(ingest): Rename csv / s3 / file source and sink ( #10675 )
2024-06-11 11:44:13 -07:00
Harshal Sheth
3d5735cbc5
chore(ingest): run pyupgrade for python 3.8 ( #10513 )
2024-05-15 22:31:05 -07:00
Harshal Sheth
3cede10ab3
feat(ingest/dbt): support use_compiled_code
and test_warnings_are_errors
( #8956 )
2023-10-05 10:29:47 -07:00
Harshal Sheth
fd9121737d
fix(ingest/file): remove entity_type_counts
and aspect_counts
( #8586 )
2023-08-09 13:01:12 -04:00
Andrew Sikowitz
fdbc4de695
refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source ( #8101 )
2023-05-24 13:36:19 -07:00
Andrew Sikowitz
2e1c3981aa
refactor(ingest): Move source_helpers.py from datahub/utilities -> datahub/api ( #8052 )
2023-05-17 20:51:06 -07:00
Andrew Sikowitz
7ba2d13087
refactor(ingest): Make get_workunits() return MetadataWorkUnits ( #8051 )
...
- Deprecates UsageAggregationClass, /usageStats?action=batchIngest, UsageStatsWorkUnit
- Removes parsing of UsageAggregationClass in file source, all sinks, and WorkUnitRecordExtractor
2023-05-17 00:01:57 -04:00
xiphl
af09034523
[bugfix] Fix remote file ingestion for Windows ( #7888 )
...
Co-authored-by: Shirshanka Das <shirshanka+github@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-04-27 10:28:10 -07:00
Andrew Sikowitz
de587b2bfe
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources ( #7718 )
...
- Adds auto_workunit_reporter to each source
- Standardizes comments around remote paths
- Adds back AuditStamp to FileLineage source
- Some generic refactoring
2023-03-31 15:49:24 -07:00
xiphl
7d240c600a
feat(ingestion) Allow for ingestion to read files remotely ( #7552 )
...
Co-authored-by: xiphl <xiphlerl9@gmail.com>
Allows the CsvEnricher, BusinessGlossary, File, and LineageFile sources to read from URLs.
2023-03-29 18:10:46 -07:00
Harshal Sheth
667ca8632d
feat(ingest): avoid embedding serialized json in metadata files ( #6742 )
2022-12-28 19:28:38 -05:00
Aseem Bansal
43c566ee4f
feat(ingest): add dummy data source for automated testing ( #6550 )
2022-12-06 16:57:12 +05:30
Harshal Sheth
817406eadb
refactor(ingest): simplify stateful ingestion config ( #6454 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2022-11-18 00:09:24 -05:00
Shirshanka Das
3106e42e89
fix(ingest): file - add configurability for counting all elements before starting ingestion ( #6136 )
2022-10-06 10:24:23 -07:00
Harshal Sheth
f227bd982b
refactor(ingest): streamline pydantic configs ( #6011 )
2022-09-25 23:37:48 -07:00
Harshal Sheth
e23523a781
fix(ingest): fix type annotations on some pydantic fields ( #5795 )
2022-09-14 11:05:31 -07:00
Harshal Sheth
a1e1d2fd0a
feat(ingest): add ConfigEnum type ( #5734 )
2022-09-14 09:57:42 -07:00
Shirshanka Das
84b279a933
feat(ingest):looker - reduce mem usage, misc reporting improvements ( #5823 )
2022-09-04 15:43:57 -07:00
Aseem Bansal
83d15dd86e
feat(ingest): file - allow filter by aspect and get stats ( #5738 )
2022-08-26 18:54:26 +05:30
Aseem Bansal
3da9941521
feat(ingest): round reported time to 2 decimal places ( #5721 )
2022-08-24 23:08:33 -07:00
Shirshanka Das
505cefef13
feat(ingest): better reporting for file source, friendlier stats names ( #5710 )
2022-08-23 10:21:24 -07:00
Shirshanka Das
647de906f2
feat(ingest): rest-sink - stability improvements to handle large inpu… ( #5693 )
2022-08-21 12:30:19 -07:00
Shirshanka Das
bb788ac317
feat(ingest): file - add support for folders, large files, improve co… ( #5692 )
2022-08-21 14:18:22 +05:30
Shirshanka Das
7ed9cd2838
feat(ingest): snowflake - basic test connection capability ( #5464 )
2022-07-22 09:14:37 +02:00
Shirshanka Das
76133c6f37
feat(ingest): add test source connection feature, structured report file ( #5442 )
2022-07-19 20:40:59 -07:00
Shirshanka Das
a9ad138172
feat(ingest): docs - overhaul source connector docs to make it code driven ( #4798 )
...
Co-authored-by: MugdhaHardikar-GSLab <mugdha.hardikar@gslab.com>
2022-05-02 00:18:15 -07:00
John Joyce
352a0abf8d
Introducing TimeSeries Aspects + Dataset Profile (Stats) Aspect ( #2983 )
...
Co-authored-by: Dexter Lee <dexter@acryl.io>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2021-07-30 17:41:03 -07:00
Harshal Sheth
7ab6355b1c
feat(ingest): stricter deserialization for MCE JSONs ( #2976 )
2021-07-28 14:50:21 -07:00
Harshal Sheth
79f60d8b8a
refactor(ingest): remove deprecated methods and warn on deprecated import ( #2797 )
2021-06-29 11:43:43 -07:00
Harshal Sheth
937f02c6bc
feat: usage stats (part 1) ( #2750 )
...
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
2021-06-24 17:11:00 -07:00