2155 Commits

Author SHA1 Message Date
Harshal Sheth
9a2e990bed
fix(sdk): throw errors on empty gms server urls (#8017) 2023-05-11 21:42:22 -07:00
Harshal Sheth
82afdb2c78
feat(cli): move registry delete to separate subcommand (#7968) 2023-05-11 12:55:46 -07:00
Tamas Nemeth
c0d50d0b2c
fix(ingest/s3) Adding missing more-itertools dependency (#8023) 2023-05-11 12:14:25 -07:00
Andrew Sikowitz
9c7742b1d7
fix(ingest/unity): Update databricks-cli pin (#8024) 2023-05-11 12:14:10 -07:00
Andrew Sikowitz
afcf462cb1
feat(ingest/unity): Add profiling support (#7976)
- Also adds a new databricks sdk
2023-05-11 10:00:50 -07:00
Mayuri Nehate
294f65fdd7
fix(ingest/snowflake): fix lineage query aggregation for optimised lineage path (#8011)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-11 09:58:34 -07:00
Mayuri Nehate
eb99012c86
feat(ingest/classification): add classification report (#7925)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-05-10 13:01:33 -07:00
Tamas Nemeth
dec54bf098
feat(ingest/s3): Inferring schema from the alphabetically last folder (#8005) 2023-05-10 21:55:05 +02:00
Andrew Sikowitz
a68833769e
refactor(ingest/unity): Use databricks-sdk over databricks-cli for usage query (#7981) 2023-05-09 13:30:11 -07:00
Andrew Sikowitz
44406f7adf
fix(ingest/postgres): Allow specification of initial engine database; set default database to postgres (#7915)
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
2023-05-09 11:11:43 -07:00
Hyejin Yoon
b04d59e13d
docs: add tips on language switchable tap on docs (#7984) 2023-05-08 18:05:22 -07:00
Mayuri Nehate
c845c75a2d
feat(ingest/snowflake): add config option to specify deny patterns for upstreams (#7962)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-08 14:13:57 -07:00
Mayuri Nehate
13b1d66170
fix(ingest/bigquery): remove incorrectly used table_pattern filter (#7810)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-05-08 10:33:42 -07:00
Mayuri Nehate
0131aeefb1
fix(ingest/unity): improve error message if no scheme in workspace_url (#7951)
Co-authored-by: John Joyce <john@acryl.io>
2023-05-08 10:13:53 -07:00
Mayuri Nehate
fe097f116e
fix(ingest): use with for opened connections (#7908) 2023-05-08 10:12:06 -07:00
Tamas Nemeth
0e69e5a810
fix(ingest/redshift): Enabling autocommit for Redshift connection (#7983) 2023-05-08 10:24:40 +02:00
Tamas Nemeth
75c03d7229
fix(ingestion/redshift) - Fixing schema query (#7975) 2023-05-06 11:20:01 +02:00
Harshal Sheth
721ab5da37
fix(ingest): use certs correctly in rest emitter (#7978) 2023-05-06 11:17:54 +02:00
Harshal Sheth
b074387185
fix(ingest/salesforce): fix lint (#7980) 2023-05-06 11:16:52 +02:00
David Sanchez
42999df06f
fix(ingest/tableau): Add a try catch to LineageRunner parser (#7965) 2023-05-05 12:54:09 -07:00
matthew-piatkus-cko
bfde4662c7
fix(ingest/salesforce): support JSON web token auth (#7963) 2023-05-05 18:17:43 +00:00
Harshal Sheth
1ebc88caf2
fix: build vercel python from source (#7972) 2023-05-05 17:10:05 +02:00
Andrew Sikowitz
8019d17aa6
fix(ingest/bigquery): Filter projects for lineage and usage (#7954) 2023-05-04 18:14:48 +02:00
Harshal Sheth
ca5dffa54d
refactor(ingest/biz-glossary): simplify business glossary source (#7912) 2023-05-03 17:01:58 -07:00
Harshal Sheth
a9e0038199
docs(ingest/postgres): add example with ssl configuration (#7916)
Co-authored-by: John Joyce <john@acryl.io>
2023-05-03 15:22:24 -07:00
Reilman79
b6e2cc549a
fix(ldap): properly handle escaped characters in LDAP DNs (#7928) 2023-05-03 13:57:52 -07:00
Felipe Ribeiro
d504cbd1b6
docs(ingest): update max_threads default value (#7947)
Co-authored-by: Felipe Ribeiro <fribeiro@fanatics.com>
2023-05-02 22:54:15 -07:00
Harshal Sheth
b12c2b8327
fix(ingest): improve error message when graph connection fails (#7946) 2023-05-02 16:30:58 -07:00
Harshal Sheth
6833494347
feat(airflow): respect port parameter if provided (#7945) 2023-05-02 16:28:22 -07:00
Harshal Sheth
bf86235e26
fix(ingest/unity): use fully qualified catalog/schema patterns (#7900) 2023-05-02 16:27:17 -07:00
Hyejin Yoon
8a7aeac9d9
feat: add missing python sdk guides based on DatahubGraph (#7875)
Co-authored-by: socar-dini <dini@socar.kr>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-05-03 07:32:23 +09:00
Mayuri Nehate
3c04b1bb17
docs(ingest): add note about path_specs configuration in data lake sources (#7941) 2023-05-02 15:08:54 +02:00
Mayuri Nehate
a711baa131
fix(ingest/hive): fix containers generation for hive (#7926) 2023-05-02 15:07:51 +02:00
Andrew Sikowitz
4e9c398e1d
fix(ingest/unity): Add sqllineage dependency (#7938) 2023-05-01 23:26:49 -04:00
Andrew Sikowitz
eb1674ffdb
fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin (#7935) 2023-05-01 14:47:44 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference (#7910)
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
david-leifker
cd05f5b174
feat(schema-registry): replace confluent schema registry (#7930)
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ryan Holstien <ryan@acryl.io>
2023-05-01 13:18:41 -05:00
Andrew Sikowitz
ca3cab4e23
refactor(ingest): report soft deleted stale entities with LossyList (#7907) 2023-04-27 15:40:19 -07:00
xiphl
af09034523
[bugfix] Fix remote file ingestion for Windows (#7888)
Co-authored-by: Shirshanka Das <shirshanka+github@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-04-27 10:28:10 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source (#7903)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Harshal Sheth
916cb21454
test(ingest/biz-glossary): add test for enable_auto_id (#7911) 2023-04-26 19:48:52 -07:00
Harshal Sheth
a33153c1f6
feat(sdk): add DataHubGraph.get_entity_semityped method (#7905) 2023-04-26 13:44:13 -07:00
Pedro Silva
967260634c
Revert "feat(cli): Modifies ingest-sample-data command to use DataHub… (#7899) 2023-04-26 16:56:22 +01:00
Harshal Sheth
29e5cfd643
fix(ingest): fix minor bug + protective dep requirements (#7861) 2023-04-25 14:35:01 -07:00
Mayuri Nehate
031aee4298
fix(ingest/bigquery): fix handling of time decorator offset queries (#7843)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-04-25 13:51:20 -07:00
Mayuri Nehate
ca1f1903ea
fix(ingest/snowflake): fix optimised lineage query, filter temporary tables (#7894)
With this change, below snowflake query errors for larger lineage time window are fixed:

error 1 - 100099 (22000): Result array of ARRAYAGG is too large.
error 2 - max LOB size (16777216) exceeded, actual size of parsed column is xxxxxxxxxx
2023-04-25 13:51:04 -07:00
Harshal Sheth
19d7c392d6
feat(sdk): support entity types filter in get_urns_by_filter (#7902) 2023-04-25 13:31:55 -07:00
Harshal Sheth
71ecbd6060
fix(ingest/dbt): ensure dbt shows view properties (#7872) 2023-04-25 12:25:07 -07:00
Mayuri Nehate
28986d8081
fix(ingestion/tableau): backward compatibility with version 2021.1 and above (#7864)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-24 11:08:56 -07:00
Mayuri Nehate
3212e74969
feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern (#7842) 2023-04-24 11:01:15 -07:00