935 Commits

Author SHA1 Message Date
Harshal Sheth
f860ce95c0
feat(ingest): emit state payloads as soft-deleted (#7714) 2023-04-04 17:06:21 +00:00
Andrew Sikowitz
de587b2bfe
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources (#7718)
- Adds auto_workunit_reporter to each source
- Standardizes comments around remote paths
- Adds back AuditStamp to FileLineage source
- Some generic refactoring
2023-03-31 15:49:24 -07:00
Harshal Sheth
f6d7e1a325
feat(ingest/snowflake): hide host_port from snowflake docs (#7717) 2023-03-31 15:58:52 +05:30
Harshal Sheth
94fa62d431
chore(ingest): formatting + cleanup MCPW usages (#7706) 2023-03-29 11:43:25 -07:00
Harshal Sheth
2eb9fe408a
docs(): generate docs for our Python SDK (#7612) 2023-03-28 20:23:20 -07:00
Mayuri Nehate
fc238c2513
feat(ingest/postgres): support extracting metadata from all databases in single recipe (#7581)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-03-28 14:16:12 -07:00
Andrew Sikowitz
c7d35ffd66
perf(ingest): Improve FileBackedDict iteration performance; minor refactoring (#7689)
- Adds dirty bit to cache, only writes data if dirty
- Refactors __iter__
- Adds sql_query_iterator
- Adds items_snapshot, more performant `items()` that allows for filtering
- Renames connection -> shared_connection
- Removes unnecessary flush during close if connection is not shared
- Adds Closeable mixin
2023-03-27 17:20:34 -04:00
Andrew Sikowitz
419bee8614
fix(ingest/bigquery): Fix BigQueryTableType enum accesses (#7685) 2023-03-25 00:08:11 +00:00
Mayuri Nehate
301c8616ed
refactor(ingest/bigquery): add inline comments + refactor in table name parsing (#7609) 2023-03-24 14:44:30 -04:00
Shirshanka Das
3d81539c7e
fix(ingest): json-schema - nullability handling (#7667) 2023-03-23 23:07:30 +00:00
Andrew Sikowitz
95f99198af
fix(ingest/bigquery): Pass whether view is materialized; pass last_altered correctly (#7660) 2023-03-22 13:40:57 -04:00
david-leifker
697e8e2647
fix(misc): misc fixes (#7633) 2023-03-21 19:42:50 +05:30
Harshal Sheth
482431bcf4
fix(ingest/superset): support superset v2 (#7588)
Co-authored-by: John Joyce <john@acryl.io>
2023-03-20 19:49:32 -07:00
alex-magno
6ab606b748
fix(ingest/dbt): introduce lowercase column urn option (#7418)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-03-20 10:37:19 -07:00
Shirshanka Das
104c9811f5
fix(ingest/docs): improve matcher to include types with spaces in them (#7631) 2023-03-18 12:59:43 -07:00
Shirshanka Das
41d4c0b074
feat(ingest/docs): json-schema fixes, improvements to ingestion doc generation (#7615) 2023-03-17 15:58:14 +01:00
Harshal Sheth
89734587f7
feat(ingest): add urn modification helper (#7440)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-03-16 13:27:08 -07:00
Andrew Sikowitz
8c1fa04c87
fix(ingest/snowflake): Allow SnowflakeObjectAccessEntry.objectId to be None (#7601)
Co-authored-by: Pedro Silva <pedro@acryl.io>
2023-03-16 12:55:52 +01:00
Andrew Sikowitz
8dd7a85533
refactor(ingest): Use shared connection wrapper over connection cache (#7570) 2023-03-14 15:09:37 -07:00
Harshal Sheth
fbfe43b1cb
feat(ingest): fix edge cases + interface cleanup for file-system APIs (#7533) 2023-03-13 13:14:53 -07:00
Harshal Sheth
b82afa89f1
feat(ingest): enable joins across FileBackedDicts + add FileBackedList (#7506) 2023-03-09 15:22:03 -08:00
Harshal Sheth
01ee351c4c
fix(ingest): prevent logging from blowing up on TypeErrors (#7497) 2023-03-03 14:36:55 -08:00
Aseem Bansal
1adbc2cab0
chore(ci): upgrade GE version (#7290) 2023-03-02 10:47:38 -08:00
Andrew Sikowitz
8101f0d47a
feat(ingest): Introduce FileBackedDict for offloading data to disk (#7461)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Also includes minor refactoring to the bigquery connector
2023-03-01 19:09:51 -05:00
Shirshanka Das
17e85979dd
refactor(ingest): subtypes - standardize (#7437) 2023-02-28 13:11:07 -08:00
Harshal Sheth
639bbcfa86
chore(ingest/glue): cleanup deprecated underlying_platform config (#7449) 2023-02-28 10:41:54 -08:00
Harshal Sheth
3b8b5e8aa4
chore(ingest): cleanup unused files/vars in tests (#7450) 2023-02-28 08:07:34 +01:00
Tamas Nemeth
14a660428e
fix(ingest/bigquery): Querying table metadata details in batch properly (#7429) 2023-02-27 11:10:24 +01:00
Andrew Sikowitz
0532cc9056
fix(ingest/bigquery) Filter upstream lineage by list of existing tables (#7415)
Co-authored-by: mayurinehate <mayuri.nehate@gslab.com>
- Creates global stores table_refs and view_upstream_tables when extracting lineage
- Moves lineage processing to the end, after schema processing
- Adds `project_ids` config option to specify multiple projects to ingest; adds corresponding tests
- Changes `created` timestamps to `auditStamp` on `UpstreamClass`; uses VIEW type for lineage identified through view ddl parsing
2023-02-23 19:40:00 -05:00
Tamas Nemeth
4c1bf18f9a
feat(ingest/bigquery) - Emit cross-project usage from gcp logs (#7364) 2023-02-22 18:53:35 -05:00
Andrew Sikowitz
e82e284982
fix(ingest/kafka): Remove topic from kafka browse path (#7398) 2023-02-22 18:38:08 -05:00
Andrew Sikowitz
2764c44977
fix(ingest): Do not require platform_instance for stateful ingestion (#7397) 2023-02-21 21:27:44 -05:00
Aseem Bansal
986086ae00
test(cli): add check for missing init files (#7378) 2023-02-20 18:41:12 +05:30
Shirshanka Das
07e4d0696f
feat(ingest): json-schema - add json schema support for files and kaf… (#7361) 2023-02-19 08:43:13 -08:00
Mayuri Nehate
2cffec9452
fix(check upgrade): update logic to compare server and client version (#7238)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-13 13:09:38 -08:00
Andrew Sikowitz
8901498582
fix(transformers): pattern add domain transformer - enable replace_existing (#7317) 2023-02-13 12:52:44 -08:00
Tamas Nemeth
f10d622e47
fix(ingest/bigquery): Improve memory usage of lineage extraction (#7326) 2023-02-13 19:59:11 +01:00
Tamas Nemeth
b34e4fe1f1
fix(ingest/bigquery): Fix for table cache was not cleared (#7323) 2023-02-13 19:04:19 +01:00
Harshal Sheth
55442042ff
feat(cli): improve startup time (#7292) 2023-02-10 21:36:01 +05:30
Tamas Nemeth
1402071e48
fix(ingest/bigquery) - Fix for Bigquery parser quoted semicolon in the FROM table name as well (#7277) 2023-02-08 10:18:55 +01:00
Daniel Messias
0d67e188ef
feat(glue): Use table name as human-readable name for Glue ingestion (#7213)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-02 18:04:35 +01:00
Harshal Sheth
db1a0f13f3
fix(ingest): fix issue in glue tests (#7185) 2023-01-30 21:51:21 -08:00
Harshal Sheth
927d45dda9
feat(ingest): add --log-file option and show CLI logs in UI report (#7118)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-01-26 09:25:02 -08:00
Harshal Sheth
45f50d2614
test(ingest): fix kafka admin client mocking (#7098)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-01-23 16:22:20 +01:00
Tamas Nemeth
0cdb5e4b4b
refactor(ingest/containers): Refactoring container creation to common place (#6877) 2023-01-21 00:14:31 +01:00
Harshal Sheth
e23eb7108f
feat(ingest): reporting revamp, part 1 (#7031) 2023-01-18 13:34:32 -08:00
Harshal Sheth
35bd73a28b
feat(ingest): fix handling of unions with aliases in post restli conversion (#7058) 2023-01-18 09:29:46 -08:00
Tim
e2ad881d79
refactor(ingest/athena): Replace s3_staging_dir parameter in Athena source with query_result_location (#7044)
Co-authored-by: John Joyce <john@acryl.io>
2023-01-18 09:25:37 -08:00
Harshal Sheth
cb12910b6b
feat(ingest): add entity registry in codegen (#6984)
Co-authored-by: Pedro Silva <pedro@acryl.io>
2023-01-17 19:41:43 -08:00
Harshal Sheth
432feaa16d
feat(ingest): mark database_alias and env as deprecated (#6901) 2023-01-09 19:58:19 +05:30