3089 Commits

Author SHA1 Message Date
Andrew Sikowitz
7a71b84296
refactor(ingest): Convert FileBackedDict to dataclass for cleaner init (#7469)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-03-02 19:53:05 -08:00
mohdsiddique
29d171106b
feat(ingest/tableau): project path and container support (#7426)
Co-authored-by: mayurinehate <mayuri.nehate@gslab.com>
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: John Joyce <john@acryl.io>
2023-03-02 16:53:19 -08:00
Kevin G
622688916c
fix(ingest/dbt): check for nodes key before accessing (#7462)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-03-02 11:40:13 -08:00
Tamas Nemeth
3f88fb7d16
feat(ingest/bigquery) - Capture dataset labels in bigquery (#7460) 2023-03-02 11:41:19 +01:00
Harshal Sheth
49029943f9
fix(ingest): remove extraneous platform configs (#7454) 2023-03-02 01:10:35 -08:00
Thomas Memenga
18dd7298ad
fix(ingest/s3): propagate s3 endpoint to profiling (#7431) 2023-03-02 01:05:17 -08:00
Tony Ouyang
4f651b0d3d
fix(ingest/bigquery): update bigquery platform_instance capability (#7467) 2023-03-02 00:52:40 -08:00
Harshal Sheth
c648f7376a
refactor(ingest): use auto_stale_entity_removal in json schema source (#7465) 2023-03-02 08:25:41 +01:00
Harshal Sheth
619fad0ae1
fix(ingest/dbt): remove deprecated backcompat_skip_source_on_lineage_edge option (#7466) 2023-03-02 08:24:50 +01:00
Andrew Sikowitz
8101f0d47a
feat(ingest): Introduce FileBackedDict for offloading data to disk (#7461)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Also includes minor refactoring to the bigquery connector
2023-03-01 19:09:51 -05:00
Harshal Sheth
45feb01e3b
fix(ingest/bigquery): simplify type annotations for bigquery usage (#7457)
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-03-01 08:38:36 -08:00
Harshal Sheth
2c3e3c203f
docs(ingest): add details about backwards compatibility guarantees (#7439) 2023-02-28 13:33:58 -08:00
Shirshanka Das
17e85979dd
refactor(ingest): subtypes - standardize (#7437) 2023-02-28 13:11:07 -08:00
Harshal Sheth
73493c577b
refactor(ingest): avoid allowing extras for all DataHubGraphConfig (#7448) 2023-02-28 10:42:31 -08:00
Harshal Sheth
639bbcfa86
chore(ingest/glue): cleanup deprecated underlying_platform config (#7449) 2023-02-28 10:41:54 -08:00
nachiket-juneja
e07cd2090b
Feat/s3 ingestion enhancement to update schema from latest partition (#7410)
Co-authored-by: Prashant Singh Thakur <prashant.thakur@nucleusteq.com>
2023-02-28 08:58:28 +01:00
Tamas Nemeth
62e33e03a3
fix(ingest/unity): Use assigned metastore if not metastore listed in unity catalog (#7446) 2023-02-28 08:06:28 +01:00
Tamas Nemeth
77d072b522
fix(ingest/athena): Fix athena source if dbname is not specified in the connection string (#7417) 2023-02-27 22:15:29 +01:00
Tamas Nemeth
1b53c03794
fix(ingest/snowflake): fixing Snowflake state issue (#7443) 2023-02-27 13:59:30 +01:00
Tamas Nemeth
14a660428e
fix(ingest/bigquery): Querying table metadata details in batch properly (#7429) 2023-02-27 11:10:24 +01:00
Harshal Sheth
d02701d91c
docs(ingest): add ingestion configs guide (#7438) 2023-02-26 16:04:23 -08:00
Shirshanka Das
221b1ae801
fix(ingest): lookml - add support for includes, extends, view_name i… (#7428) 2023-02-24 12:05:21 -08:00
Tamas Nemeth
3a4c9a69f6
fix(ingest/bigquery): Fixing double quoting in profiling approx count query (#7416) 2023-02-24 09:39:52 +01:00
Andrew Sikowitz
0532cc9056
fix(ingest/bigquery) Filter upstream lineage by list of existing tables (#7415)
Co-authored-by: mayurinehate <mayuri.nehate@gslab.com>
- Creates global stores table_refs and view_upstream_tables when extracting lineage
- Moves lineage processing to the end, after schema processing
- Adds `project_ids` config option to specify multiple projects to ingest; adds corresponding tests
- Changes `created` timestamps to `auditStamp` on `UpstreamClass`; uses VIEW type for lineage identified through view ddl parsing
2023-02-23 19:40:00 -05:00
Tamas Nemeth
4c1bf18f9a
feat(ingest/bigquery) - Emit cross-project usage from gcp logs (#7364) 2023-02-22 18:53:35 -05:00
Andrew Sikowitz
e82e284982
fix(ingest/kafka): Remove topic from kafka browse path (#7398) 2023-02-22 18:38:08 -05:00
Mayuri Nehate
d436ab9f9b
feat(ingest/kafka-connect): add config to lowercase urns, do not emit… (#7393)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-22 11:42:44 -08:00
Mayuri Nehate
5db133619f
fix(ingest/bigquery): Prefer parsed lineage for view over lineage from audit logs (#7408) 2023-02-22 11:51:04 -05:00
Andrew Sikowitz
c5c2bdb983
fix(ingest/bigquery): Correctly upsert lineage_map when parsing view ddl (#7403) 2023-02-22 11:57:01 +01:00
Andrew Sikowitz
2764c44977
fix(ingest): Do not require platform_instance for stateful ingestion (#7397) 2023-02-21 21:27:44 -05:00
서재권(Data Platform)
3068e7f0b1
fix(ingest/oracle) add database name to oracle urn name (#7016) 2023-02-21 13:50:24 -05:00
Andrew Sikowitz
8fd2cc5f20
fix(ingest/snowflake): Improve memory usage of metadata extraction (#7349) 2023-02-20 14:46:10 +01:00
Aseem Bansal
986086ae00
test(cli): add check for missing init files (#7378) 2023-02-20 18:41:12 +05:30
Shirshanka Das
07e4d0696f
feat(ingest): json-schema - add json schema support for files and kaf… (#7361) 2023-02-19 08:43:13 -08:00
Andrew Sikowitz
632f730803
fix(ingest/looker): do not instantiate LookerDashboardSource on test_connection (#7369) 2023-02-18 09:32:28 +01:00
Teppo Naakka
702221089d
feat(powerbi): add chart entities to similar browsepath as dashboards (#7293)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-17 13:38:48 -08:00
mohdsiddique
79f576e2e1
fix(ingestion): powerbi # continue ingestion if m-query parsing fail (#7360)
Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
2023-02-17 13:24:10 -08:00
Tamas Nemeth
32ab7949de
fix(ingestion/snowflake): Fixing stateful ingestion commit at Snowflake source (#7363) 2023-02-17 13:23:16 -08:00
Pedro Silva
50f7935d5b
fix(cli): Corrects search filter for delete (#7367) 2023-02-17 13:23:01 -08:00
Aseem Bansal
5690560b6e
feat(cli): make deprecations, renames easier to notice (#7310) 2023-02-17 22:50:41 +05:30
Tamas Nemeth
aa388f04c2
fix(ingest/bigquery): Increase batch size in metadata extraction if no partitioned table involved (#7252) 2023-02-17 11:49:47 +01:00
skrydal
8207e4637a
fix(ingest/tableau): make Tableau ingestor resilient to timeout exceptions (#7333) 2023-02-15 11:21:31 +01:00
Shirshanka Das
46810e0df9
logging(cli): dropping neo4j message to debug to avoid confusion (#7340) 2023-02-14 11:32:03 -08:00
mohdsiddique
3a095f960f
feat(ingestion): powerbi # Configurable Admin API (#7055)
Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
2023-02-14 09:58:34 -08:00
Mayuri Nehate
2cffec9452
fix(check upgrade): update logic to compare server and client version (#7238)
Co-authored-by: John Joyce <john@acryl.io>
2023-02-13 13:09:38 -08:00
Andrew Sikowitz
8901498582
fix(transformers): pattern add domain transformer - enable replace_existing (#7317) 2023-02-13 12:52:44 -08:00
Felix Lüdin
da2b0c9e1b
fix(docs): sort sources by display name in doc's sidebar (#7322) 2023-02-13 12:39:54 -08:00
Tamas Nemeth
f10d622e47
fix(ingest/bigquery): Improve memory usage of lineage extraction (#7326) 2023-02-13 19:59:11 +01:00
Tamas Nemeth
b34e4fe1f1
fix(ingest/bigquery): Fix for table cache was not cleared (#7323) 2023-02-13 19:04:19 +01:00
Harshal Sheth
76846b4175
perf(ingest): speed up MCPW.validate() (#7319) 2023-02-11 23:42:28 +01:00