935 Commits

Author SHA1 Message Date
Harshal Sheth
a29b576daa
fix(ingest/json-schema): handle property inheritance in unions (#8121) 2023-05-30 22:59:28 -07:00
Tamas Nemeth
d50a99935b
fix(ingest/s3): Path spec aware folder traversal (#8095) 2023-05-30 16:20:49 +02:00
Aseem Bansal
96f364802b
feat(lineage source): add fine grained lineage support (#7904) 2023-05-26 17:09:32 +05:30
Harshal Sheth
2d442161c4
ci(ingest/kafka): improve kafka integration test reliability (#8085) 2023-05-25 15:40:56 -07:00
Andrew Sikowitz
d3cd4dbb0c
feat(ingest/unity): Allow ingestion without metastore admin role (#8091)
- Adds more detailed docs and connection test
- Fixes empty username queries
2023-05-24 15:36:22 -07:00
Mayuri Nehate
84270bcac8
feat(ingest/nifi): kerberos authentication (#8097)
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Indy Prentice <iprentic@users.noreply.github.com>
2023-05-24 15:09:01 -07:00
Andrew Sikowitz
fdbc4de695
refactor(ingest): Call source_helpers via new WorkUnitProcessors in base Source (#8101) 2023-05-24 13:36:19 -07:00
Amanda Hernando
0e0d8934ea
feat(ingest): Add GenericAspectTransformer (#7994)
Co-authored-by: Adrián Pertíñez <khurzak92@gmail.com>
2023-05-24 13:31:33 -07:00
Mayuri Nehate
b3d80e57e8
feat(ingest/bigquery): usage for views (#8046)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-24 09:48:58 -07:00
Andrew Sikowitz
8357fc8d64
feat(ingest): Browse Path v2 helper (#8012) 2023-05-23 23:46:46 -07:00
Harshal Sheth
b0f8c3de1e
refactor(ingest): simplify stateful ingestion provider interface (#8104)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-05-23 12:57:57 -07:00
Harshal Sheth
afd65e16fb
feat(cli): delete cli v2 (#8068) 2023-05-23 14:43:44 -05:00
Harshal Sheth
4873a32e4a
fix(ingest): emitter bug fixes (#8093)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-05-23 12:04:16 -07:00
Tamas Nemeth
f8be9f6aee
feat(ingest/s3): type aware directory sorting (#8089) 2023-05-23 08:59:46 +02:00
Harshal Sheth
4e9c652707
feat(ingest): add env to container properties (#8027) 2023-05-22 12:07:16 -07:00
Harshal Sheth
98bba52c20
test(sdk): move cli tests into the unit dir (#8028) 2023-05-19 16:13:39 +02:00
Harshal Sheth
00470acc02
test(sdk): better error messages in registry codegen test (#8081) 2023-05-19 11:18:50 +02:00
Andrew Sikowitz
2e1c3981aa
refactor(ingest): Move source_helpers.py from datahub/utilities -> datahub/api (#8052) 2023-05-17 20:51:06 -07:00
Shubham Jagtap
8cc6606e68
feat(ingestion/kafka): add description in dataset properties (#7974)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: mohdsiddique <mohdsiddiquebagwan@gmail.com>
2023-05-17 11:03:08 -07:00
Shirshanka Das
b3c790aab6
feat: Add support for Data Products (#8039)
Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
2023-05-17 07:17:25 +00:00
Andrew Sikowitz
7ba2d13087
refactor(ingest): Make get_workunits() return MetadataWorkUnits (#8051)
- Deprecates UsageAggregationClass, /usageStats?action=batchIngest, UsageStatsWorkUnit
- Removes parsing of UsageAggregationClass in file source, all sinks, and WorkUnitRecordExtractor
2023-05-17 00:01:57 -04:00
Mayuri Nehate
a06c5aee2c
fix(ingest/bigquery): update usage audit log query to include create/drop operations (#7995) 2023-05-16 11:58:20 -07:00
Andrew Sikowitz
afcf462cb1
feat(ingest/unity): Add profiling support (#7976)
- Also adds a new databricks sdk
2023-05-11 10:00:50 -07:00
Andrew Sikowitz
44406f7adf
fix(ingest/postgres): Allow specification of initial engine database; set default database to postgres (#7915)
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
2023-05-09 11:11:43 -07:00
Mayuri Nehate
c845c75a2d
feat(ingest/snowflake): add config option to specify deny patterns for upstreams (#7962)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-08 14:13:57 -07:00
Mayuri Nehate
13b1d66170
fix(ingest/bigquery): remove incorrectly used table_pattern filter (#7810)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-08 10:33:42 -07:00
Mayuri Nehate
0131aeefb1
fix(ingest/unity): improve error message if no scheme in workspace_url (#7951)
Co-authored-by: John Joyce <john@acryl.io>
2023-05-08 10:13:53 -07:00
Tamas Nemeth
0e69e5a810
fix(ingest/redshift): Enabling autocommit for Redshift connection (#7983) 2023-05-08 10:24:40 +02:00
Andrew Sikowitz
8019d17aa6
fix(ingest/bigquery): Filter projects for lineage and usage (#7954) 2023-05-04 18:14:48 +02:00
Harshal Sheth
ca5dffa54d
refactor(ingest/biz-glossary): simplify business glossary source (#7912) 2023-05-03 17:01:58 -07:00
Reilman79
b6e2cc549a
fix(ldap): properly handle escaped characters in LDAP DNs (#7928) 2023-05-03 13:57:52 -07:00
Felipe Ribeiro
d504cbd1b6
docs(ingest): update max_threads default value (#7947)
Co-authored-by: Felipe Ribeiro <fribeiro@fanatics.com>
2023-05-02 22:54:15 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference (#7910)
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source (#7903)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Mayuri Nehate
031aee4298
fix(ingest/bigquery): fix handling of time decorator offset queries (#7843)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-04-25 13:51:20 -07:00
Harshal Sheth
19d7c392d6
feat(sdk): support entity types filter in get_urns_by_filter (#7902) 2023-04-25 13:31:55 -07:00
Yusuf Mahtab
fa10256c47
feat(glue): allow resource links to be ignored (#7639)
Co-authored-by: Justas Cernas <justas.cernas@fundingcircle.com>
2023-04-21 10:42:32 -07:00
Harshal Sheth
af566e1184
feat(model): fully populate the entity registry (#7818) 2023-04-15 13:33:05 -07:00
Andrew Sikowitz
1ac1ccf26e
perf(ingest/bigquery): Improve bigquery usage disk usage and speed (#7825) 2023-04-14 18:09:43 -07:00
Andrew Sikowitz
e839ac4c40
fix(ingest/bigquery): Handle null values from usage aggregation (#7827) 2023-04-14 16:54:22 -07:00
Harshal Sheth
3079f0a7e1
feat(sdk): support executing graphql via DataHubGraph (#7753)
Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
2023-04-12 11:30:05 -07:00
Andrew Sikowitz
73016ebff9
test(ingest/bigquery): Add sql parser xfail test to fix later (#7792) 2023-04-12 10:51:29 -07:00
Andrew Sikowitz
54f047e1a8
test(ingest/snowflake): fix tests around host_port (#7791)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-11 16:06:35 -07:00
Harshal Sheth
e99875cac6
chore(ingest): enable flake8 bugbear linting (#7763) 2023-04-10 14:14:42 -07:00
Andrew Sikowitz
087855f374
fix(ingest/bigquery): Support cross project usage using FileBackedDict (#7663)
Includes major refactor of bigquery usage ingestion, minor refactor of the source as a whole, and reporting cleanup.
Includes bigquery performance testing changes.
2023-04-07 12:18:26 -07:00
Andrew Sikowitz
44663fa035
fix(ingest/bigquery): Raise report_failure threshold; add robustness around table parsing (#7772)
- Converted getting views and tables to iterators
- Catches exception around table expiration time being impossible to represent in python because it's too far in the future
2023-04-06 13:24:22 -07:00
Tamas Nemeth
96bacfc5d7
fix(ingest/redshift): Fixing adding back db name in redshift urn (#7765) 2023-04-06 11:45:10 +02:00
Tamas Nemeth
29d2492667
fix(ingest/bigquery): Lineage edges use datetime with timezone; correctly parse last_altered (#7762) 2023-04-06 02:46:50 +00:00
Aseem Bansal
a11a7fa9d0
feat(snowflake): better error message on key pair authentication (#7734)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-05 00:46:07 +00:00
Andrew Sikowitz
ce1ac7fa12
refactor(ingest): Use sqlite.Row row_factory for FileBackedCollections (#7739) 2023-04-04 11:53:56 -07:00