Reilman79
b6e2cc549a
fix(ldap): properly handle escaped characters in LDAP DNs ( #7928 )
2023-05-03 13:57:52 -07:00
Felipe Ribeiro
d504cbd1b6
docs(ingest): update max_threads default value ( #7947 )
...
Co-authored-by: Felipe Ribeiro <fribeiro@fanatics.com>
2023-05-02 22:54:15 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference ( #7910 )
...
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source ( #7903 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Mayuri Nehate
031aee4298
fix(ingest/bigquery): fix handling of time decorator offset queries ( #7843 )
...
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-04-25 13:51:20 -07:00
Harshal Sheth
19d7c392d6
feat(sdk): support entity types filter in get_urns_by_filter
( #7902 )
2023-04-25 13:31:55 -07:00
Yusuf Mahtab
fa10256c47
feat(glue): allow resource links to be ignored ( #7639 )
...
Co-authored-by: Justas Cernas <justas.cernas@fundingcircle.com>
2023-04-21 10:42:32 -07:00
Harshal Sheth
af566e1184
feat(model): fully populate the entity registry ( #7818 )
2023-04-15 13:33:05 -07:00
Andrew Sikowitz
1ac1ccf26e
perf(ingest/bigquery): Improve bigquery usage disk usage and speed ( #7825 )
2023-04-14 18:09:43 -07:00
Andrew Sikowitz
e839ac4c40
fix(ingest/bigquery): Handle null values from usage aggregation ( #7827 )
2023-04-14 16:54:22 -07:00
Harshal Sheth
3079f0a7e1
feat(sdk): support executing graphql via DataHubGraph ( #7753 )
...
Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
2023-04-12 11:30:05 -07:00
Andrew Sikowitz
73016ebff9
test(ingest/bigquery): Add sql parser xfail test to fix later ( #7792 )
2023-04-12 10:51:29 -07:00
Andrew Sikowitz
54f047e1a8
test(ingest/snowflake): fix tests around host_port ( #7791 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-11 16:06:35 -07:00
Harshal Sheth
e99875cac6
chore(ingest): enable flake8 bugbear linting ( #7763 )
2023-04-10 14:14:42 -07:00
Andrew Sikowitz
087855f374
fix(ingest/bigquery): Support cross project usage using FileBackedDict ( #7663 )
...
Includes major refactor of bigquery usage ingestion, minor refactor of the source as a whole, and reporting cleanup.
Includes bigquery performance testing changes.
2023-04-07 12:18:26 -07:00
Andrew Sikowitz
44663fa035
fix(ingest/bigquery): Raise report_failure threshold; add robustness around table parsing ( #7772 )
...
- Converted getting views and tables to iterators
- Catches exception around table expiration time being impossible to represent in python because it's too far in the future
2023-04-06 13:24:22 -07:00
Tamas Nemeth
96bacfc5d7
fix(ingest/redshift): Fixing adding back db name in redshift urn ( #7765 )
2023-04-06 11:45:10 +02:00
Tamas Nemeth
29d2492667
fix(ingest/bigquery): Lineage edges use datetime with timezone; correctly parse last_altered ( #7762 )
2023-04-06 02:46:50 +00:00
Aseem Bansal
a11a7fa9d0
feat(snowflake): better error message on key pair authentication ( #7734 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-05 00:46:07 +00:00
Andrew Sikowitz
ce1ac7fa12
refactor(ingest): Use sqlite.Row row_factory for FileBackedCollections ( #7739 )
2023-04-04 11:53:56 -07:00
Harshal Sheth
f860ce95c0
feat(ingest): emit state payloads as soft-deleted ( #7714 )
2023-04-04 17:06:21 +00:00
Andrew Sikowitz
de587b2bfe
refactor(ingest): Minor cleanup of File, CsvEnricher, BusinessGlossary, and FileLineage sources ( #7718 )
...
- Adds auto_workunit_reporter to each source
- Standardizes comments around remote paths
- Adds back AuditStamp to FileLineage source
- Some generic refactoring
2023-03-31 15:49:24 -07:00
Harshal Sheth
f6d7e1a325
feat(ingest/snowflake): hide host_port
from snowflake docs ( #7717 )
2023-03-31 15:58:52 +05:30
Harshal Sheth
94fa62d431
chore(ingest): formatting + cleanup MCPW usages ( #7706 )
2023-03-29 11:43:25 -07:00
Harshal Sheth
2eb9fe408a
docs(): generate docs for our Python SDK ( #7612 )
2023-03-28 20:23:20 -07:00
Mayuri Nehate
fc238c2513
feat(ingest/postgres): support extracting metadata from all databases in single recipe ( #7581 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-03-28 14:16:12 -07:00
Andrew Sikowitz
c7d35ffd66
perf(ingest): Improve FileBackedDict iteration performance; minor refactoring ( #7689 )
...
- Adds dirty bit to cache, only writes data if dirty
- Refactors __iter__
- Adds sql_query_iterator
- Adds items_snapshot, more performant `items()` that allows for filtering
- Renames connection -> shared_connection
- Removes unnecessary flush during close if connection is not shared
- Adds Closeable mixin
2023-03-27 17:20:34 -04:00
Andrew Sikowitz
419bee8614
fix(ingest/bigquery): Fix BigQueryTableType enum accesses ( #7685 )
2023-03-25 00:08:11 +00:00
Mayuri Nehate
301c8616ed
refactor(ingest/bigquery): add inline comments + refactor in table name parsing ( #7609 )
2023-03-24 14:44:30 -04:00
Shirshanka Das
3d81539c7e
fix(ingest): json-schema - nullability handling ( #7667 )
2023-03-23 23:07:30 +00:00
Andrew Sikowitz
95f99198af
fix(ingest/bigquery): Pass whether view is materialized; pass last_altered correctly ( #7660 )
2023-03-22 13:40:57 -04:00
david-leifker
697e8e2647
fix(misc): misc fixes ( #7633 )
2023-03-21 19:42:50 +05:30
Harshal Sheth
482431bcf4
fix(ingest/superset): support superset v2 ( #7588 )
...
Co-authored-by: John Joyce <john@acryl.io>
2023-03-20 19:49:32 -07:00
alex-magno
6ab606b748
fix(ingest/dbt): introduce lowercase column urn option ( #7418 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-03-20 10:37:19 -07:00
Shirshanka Das
104c9811f5
fix(ingest/docs): improve matcher to include types with spaces in them ( #7631 )
2023-03-18 12:59:43 -07:00
Shirshanka Das
41d4c0b074
feat(ingest/docs): json-schema fixes, improvements to ingestion doc generation ( #7615 )
2023-03-17 15:58:14 +01:00
Harshal Sheth
89734587f7
feat(ingest): add urn modification helper ( #7440 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-03-16 13:27:08 -07:00
Andrew Sikowitz
8c1fa04c87
fix(ingest/snowflake): Allow SnowflakeObjectAccessEntry.objectId to be None ( #7601 )
...
Co-authored-by: Pedro Silva <pedro@acryl.io>
2023-03-16 12:55:52 +01:00
Andrew Sikowitz
8dd7a85533
refactor(ingest): Use shared connection wrapper over connection cache ( #7570 )
2023-03-14 15:09:37 -07:00
Harshal Sheth
fbfe43b1cb
feat(ingest): fix edge cases + interface cleanup for file-system APIs ( #7533 )
2023-03-13 13:14:53 -07:00
Harshal Sheth
b82afa89f1
feat(ingest): enable joins across FileBackedDicts + add FileBackedList ( #7506 )
2023-03-09 15:22:03 -08:00
Harshal Sheth
01ee351c4c
fix(ingest): prevent logging from blowing up on TypeErrors ( #7497 )
2023-03-03 14:36:55 -08:00
Aseem Bansal
1adbc2cab0
chore(ci): upgrade GE version ( #7290 )
2023-03-02 10:47:38 -08:00
Andrew Sikowitz
8101f0d47a
feat(ingest): Introduce FileBackedDict for offloading data to disk ( #7461 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Also includes minor refactoring to the bigquery connector
2023-03-01 19:09:51 -05:00
Shirshanka Das
17e85979dd
refactor(ingest): subtypes - standardize ( #7437 )
2023-02-28 13:11:07 -08:00
Harshal Sheth
639bbcfa86
chore(ingest/glue): cleanup deprecated underlying_platform
config ( #7449 )
2023-02-28 10:41:54 -08:00
Harshal Sheth
3b8b5e8aa4
chore(ingest): cleanup unused files/vars in tests ( #7450 )
2023-02-28 08:07:34 +01:00
Tamas Nemeth
14a660428e
fix(ingest/bigquery): Querying table metadata details in batch properly ( #7429 )
2023-02-27 11:10:24 +01:00
Andrew Sikowitz
0532cc9056
fix(ingest/bigquery) Filter upstream lineage by list of existing tables ( #7415 )
...
Co-authored-by: mayurinehate <mayuri.nehate@gslab.com>
- Creates global stores table_refs and view_upstream_tables when extracting lineage
- Moves lineage processing to the end, after schema processing
- Adds `project_ids` config option to specify multiple projects to ingest; adds corresponding tests
- Changes `created` timestamps to `auditStamp` on `UpstreamClass`; uses VIEW type for lineage identified through view ddl parsing
2023-02-23 19:40:00 -05:00
Tamas Nemeth
4c1bf18f9a
feat(ingest/bigquery) - Emit cross-project usage from gcp logs ( #7364 )
2023-02-22 18:53:35 -05:00