3089 Commits

Author SHA1 Message Date
matthew-piatkus-cko
bfde4662c7
fix(ingest/salesforce): support JSON web token auth (#7963) 2023-05-05 18:17:43 +00:00
Andrew Sikowitz
8019d17aa6
fix(ingest/bigquery): Filter projects for lineage and usage (#7954) 2023-05-04 18:14:48 +02:00
Harshal Sheth
ca5dffa54d
refactor(ingest/biz-glossary): simplify business glossary source (#7912) 2023-05-03 17:01:58 -07:00
Reilman79
b6e2cc549a
fix(ldap): properly handle escaped characters in LDAP DNs (#7928) 2023-05-03 13:57:52 -07:00
Harshal Sheth
b12c2b8327
fix(ingest): improve error message when graph connection fails (#7946) 2023-05-02 16:30:58 -07:00
Harshal Sheth
6833494347
feat(airflow): respect port parameter if provided (#7945) 2023-05-02 16:28:22 -07:00
Harshal Sheth
bf86235e26
fix(ingest/unity): use fully qualified catalog/schema patterns (#7900) 2023-05-02 16:27:17 -07:00
Mayuri Nehate
3c04b1bb17
docs(ingest): add note about path_specs configuration in data lake sources (#7941) 2023-05-02 15:08:54 +02:00
Mayuri Nehate
a711baa131
fix(ingest/hive): fix containers generation for hive (#7926) 2023-05-02 15:07:51 +02:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference (#7910)
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
david-leifker
cd05f5b174
feat(schema-registry): replace confluent schema registry (#7930)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ryan Holstien <ryan@acryl.io>
2023-05-01 13:18:41 -05:00
Andrew Sikowitz
ca3cab4e23
refactor(ingest): report soft deleted stale entities with LossyList (#7907) 2023-04-27 15:40:19 -07:00
xiphl
af09034523
[bugfix] Fix remote file ingestion for Windows (#7888)
Co-authored-by: Shirshanka Das <shirshanka+github@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-04-27 10:28:10 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source (#7903)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Harshal Sheth
a33153c1f6
feat(sdk): add DataHubGraph.get_entity_semityped method (#7905) 2023-04-26 13:44:13 -07:00
Pedro Silva
967260634c
Revert "feat(cli): Modifies ingest-sample-data command to use DataHub… (#7899) 2023-04-26 16:56:22 +01:00
Harshal Sheth
29e5cfd643
fix(ingest): fix minor bug + protective dep requirements (#7861) 2023-04-25 14:35:01 -07:00
Mayuri Nehate
031aee4298
fix(ingest/bigquery): fix handling of time decorator offset queries (#7843)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-04-25 13:51:20 -07:00
Mayuri Nehate
ca1f1903ea
fix(ingest/snowflake): fix optimised lineage query, filter temporary tables (#7894)
With this change, below snowflake query errors for larger lineage time window are fixed:

error 1 - 100099 (22000): Result array of ARRAYAGG is too large.
error 2 - max LOB size (16777216) exceeded, actual size of parsed column is xxxxxxxxxx
2023-04-25 13:51:04 -07:00
Harshal Sheth
19d7c392d6
feat(sdk): support entity types filter in get_urns_by_filter (#7902) 2023-04-25 13:31:55 -07:00
Harshal Sheth
71ecbd6060
fix(ingest/dbt): ensure dbt shows view properties (#7872) 2023-04-25 12:25:07 -07:00
Mayuri Nehate
28986d8081
fix(ingestion/tableau): backward compatibility with version 2021.1 and above (#7864)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-24 11:08:56 -07:00
Mayuri Nehate
3212e74969
feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern (#7842) 2023-04-24 11:01:15 -07:00
Pedro Silva
a5fa933fb0
feat(cli): Modifies ingest-sample-data command to use DataHub url & token based on config (#7896) 2023-04-24 15:52:10 +01:00
Andrew Sikowitz
e9c2f9afcc
feat(ingest/unity): Ingest ownership for containers; lookup service principal display names (#7869) 2023-04-21 11:02:39 -07:00
mohdsiddique
f21eeed6e7
feat(ingestion): lookml refinement support (#7781)
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-21 10:55:31 -07:00
Yusuf Mahtab
fa10256c47
feat(glue): allow resource links to be ignored (#7639)
Co-authored-by: Justas Cernas <justas.cernas@fundingcircle.com>
2023-04-21 10:42:32 -07:00
Aezo
1a5c716b87
feat(ingest/powerbi): support modified_since, extract_dataset_schema and many more (#7519)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-20 22:58:45 -07:00
Harshal Sheth
66f44945e3
docs(ingest): update dbt and aws docs (#7870) 2023-04-20 21:08:22 -07:00
Andrew Sikowitz
1ff6949e36
refactor(ingest): Add helper DataHubGraph methods (#7851)
Adds:
- get_urns_by_filter(), using scroll by entities
- get_latest_pipeline_checkpoint()
- soft_delete_urn()
2023-04-20 10:16:33 -07:00
Harshal Sheth
6802142f6e
fix(ingest/salesforce): use report timestamp for operations (#7838)
Co-authored-by: John Joyce <john@acryl.io>
2023-04-19 20:39:07 -07:00
Harshal Sheth
399e3333ad
feat(cli): improve quickstart stability (#7839) 2023-04-17 21:19:19 -07:00
Harshal Sheth
e461d03d94
feat(ingest/unity): capture create/lastModified timestamps (#7819)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-17 12:18:21 -07:00
Mayuri Nehate
a8681dae75
fix(ingest/snowflake): fix column name in snowflake optimised lineage (#7834) 2023-04-17 11:44:53 -07:00
Andrew Sikowitz
1ac1ccf26e
perf(ingest/bigquery): Improve bigquery usage disk usage and speed (#7825) 2023-04-14 18:09:43 -07:00
Andrew Sikowitz
e839ac4c40
fix(ingest/bigquery): Handle null values from usage aggregation (#7827) 2023-04-14 16:54:22 -07:00
Mayuri Nehate
8ec74ce41c
fix(ingest/bigquery): update usage query, remove erroneous init (#7811) 2023-04-14 13:38:50 -07:00
Andrew Sikowitz
37e7485184
fix(ingest/bigquery): Do not query columns when not ingesting tables or views (#7823) 2023-04-14 09:08:22 -07:00
Andrew Sikowitz
408cd7db2a
fix(ingest/bigquery): Enable lineage and usage ingestion without tables (#7820) 2023-04-14 01:41:00 -07:00
Andrew Sikowitz
d8d8176b1a
fix(ingest/bigquery): Add to lineage, not overwrite, when using sql parser (#7814) 2023-04-14 08:46:10 +02:00
Tamas Nemeth
4ec280ee20
fix(ingest/redshift): Remove pg_user table from metadata queries (#7815) 2023-04-13 15:35:26 -07:00
Andrew Sikowitz
ce795406b9
feat(ingest): Track disk usage in report (#7812) 2023-04-13 14:43:25 -07:00
RyanHolstien
0d5873db2a
feat(patch): patch support for flow info and job info and refactor patchbuilders for java sdk (#7495)
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: david-leifker <114954101+david-leifker@users.noreply.github.com>
Co-authored-by: David Leifker <david.leifker@acryl.io>
2023-04-13 15:46:35 -05:00
Harshal Sheth
4f59169566
feat(ingest/lookml): correctly handle include directives from imported projects (#7798) 2023-04-13 13:28:58 -07:00
Harshal Sheth
204727a6ee
feat(ingest/unity): support extracting ownership (#7801) 2023-04-12 19:45:41 -07:00
Harshal Sheth
3079f0a7e1
feat(sdk): support executing graphql via DataHubGraph (#7753)
Co-authored-by: Hyejin Yoon <0327jane@gmail.com>
2023-04-12 11:30:05 -07:00
Tamas Nemeth
0cc12bcce7
feat(ingest): redshift - Redshift rework (#6906) 2023-04-12 19:15:43 +02:00
Andrew Sikowitz
b7feb2a671
config(ingest/bigquery): Default lineage_use_sql_parser to true; update description (#7797) 2023-04-11 23:00:41 -07:00
Andrew Sikowitz
156d9df6b5
fix(ingest/bigquery): Fix lineage / usage table ref checks (#7800) 2023-04-11 23:00:27 -07:00
Andrew Sikowitz
54f047e1a8
test(ingest/snowflake): fix tests around host_port (#7791)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-11 16:06:35 -07:00