Hyejin Yoon
8a7aeac9d9
feat: add missing python sdk guides based on DatahubGraph ( #7875 )
...
Co-authored-by: socar-dini <dini@socar.kr>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-05-03 07:32:23 +09:00
Mayuri Nehate
3c04b1bb17
docs(ingest): add note about path_specs configuration in data lake sources ( #7941 )
2023-05-02 15:08:54 +02:00
Mayuri Nehate
a711baa131
fix(ingest/hive): fix containers generation for hive ( #7926 )
2023-05-02 15:07:51 +02:00
Andrew Sikowitz
4e9c398e1d
fix(ingest/unity): Add sqllineage dependency ( #7938 )
2023-05-01 23:26:49 -04:00
Andrew Sikowitz
eb1674ffdb
fix(ingest/unity-catalog): Add usage_common dependency to unity catalog plugin ( #7935 )
2023-05-01 14:47:44 -07:00
Andrew Sikowitz
5b290c9bc5
feat(ingest/unity): Add usage extraction; add TableReference ( #7910 )
...
- Adds usage extraction to the unity catalog source and a TableReference object to handle references to tables
Also makes the following refactors:
- Creates UsageAggregator class to usage_common, as I've seen this same logic multiple times.
- Allows customizable user_urn_builder in usage_common as not all unity users are emails. We create emails with a default email_domain config in other connectors like redshift and snowflake, which seems unnecessary now?
- Creates TableReference for unity catalog and adds it to the Table dataclass, for managing string references to tables. Replaces logic, especially in lineage extraction, with these references
- Creates gen_dataset_urn and gen_user_urn on unity source to reduce duplicate code
Breaks up proxy.py into implementation and types
2023-05-01 11:30:09 -07:00
david-leifker
cd05f5b174
feat(schema-registry): replace confluent schema registry ( #7930 )
...
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ryan Holstien <ryan@acryl.io>
2023-05-01 13:18:41 -05:00
Andrew Sikowitz
ca3cab4e23
refactor(ingest): report soft deleted stale entities with LossyList ( #7907 )
2023-04-27 15:40:19 -07:00
xiphl
af09034523
[bugfix] Fix remote file ingestion for Windows ( #7888 )
...
Co-authored-by: Shirshanka Das <shirshanka+github@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
2023-04-27 10:28:10 -07:00
Mayuri Nehate
a0c4e0dd46
feat(ingest): add GCS ingestion source ( #7903 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-27 19:03:41 +02:00
Harshal Sheth
916cb21454
test(ingest/biz-glossary): add test for enable_auto_id ( #7911 )
2023-04-26 19:48:52 -07:00
Harshal Sheth
a33153c1f6
feat(sdk): add DataHubGraph.get_entity_semityped
method ( #7905 )
2023-04-26 13:44:13 -07:00
Pedro Silva
967260634c
Revert "feat(cli): Modifies ingest-sample-data command to use DataHub… ( #7899 )
2023-04-26 16:56:22 +01:00
Harshal Sheth
29e5cfd643
fix(ingest): fix minor bug + protective dep requirements ( #7861 )
2023-04-25 14:35:01 -07:00
Mayuri Nehate
031aee4298
fix(ingest/bigquery): fix handling of time decorator offset queries ( #7843 )
...
Co-authored-by: Andrew Sikowitz <andrew.sikowitz@acryl.io>
2023-04-25 13:51:20 -07:00
Mayuri Nehate
ca1f1903ea
fix(ingest/snowflake): fix optimised lineage query, filter temporary tables ( #7894 )
...
With this change, below snowflake query errors for larger lineage time window are fixed:
error 1 - 100099 (22000): Result array of ARRAYAGG is too large.
error 2 - max LOB size (16777216) exceeded, actual size of parsed column is xxxxxxxxxx
2023-04-25 13:51:04 -07:00
Harshal Sheth
19d7c392d6
feat(sdk): support entity types filter in get_urns_by_filter
( #7902 )
2023-04-25 13:31:55 -07:00
Harshal Sheth
71ecbd6060
fix(ingest/dbt): ensure dbt shows view properties ( #7872 )
2023-04-25 12:25:07 -07:00
Mayuri Nehate
28986d8081
fix(ingestion/tableau): backward compatibility with version 2021.1 and above ( #7864 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-24 11:08:56 -07:00
Mayuri Nehate
3212e74969
feat(ingest/snowflake): optionally emit all upstreams irrespective of recipe pattern ( #7842 )
2023-04-24 11:01:15 -07:00
Pedro Silva
a5fa933fb0
feat(cli): Modifies ingest-sample-data command to use DataHub url & token based on config ( #7896 )
2023-04-24 15:52:10 +01:00
Hyejin Yoon
2bc0a781a6
fix: refactor toc ( #7862 )
2023-04-21 18:36:10 -07:00
Andrew Sikowitz
e9c2f9afcc
feat(ingest/unity): Ingest ownership for containers; lookup service principal display names ( #7869 )
2023-04-21 11:02:39 -07:00
mohdsiddique
f21eeed6e7
feat(ingestion): lookml refinement support ( #7781 )
...
Co-authored-by: MohdSiddiqueBagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-21 10:55:31 -07:00
Yusuf Mahtab
fa10256c47
feat(glue): allow resource links to be ignored ( #7639 )
...
Co-authored-by: Justas Cernas <justas.cernas@fundingcircle.com>
2023-04-21 10:42:32 -07:00
Aezo
1a5c716b87
feat(ingest/powerbi): support modified_since, extract_dataset_schema and many more ( #7519 )
...
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2023-04-20 22:58:45 -07:00
eeepmb
2616a16ec8
docs(ingest/powerbi): update workspace concept mapping ( #7835 )
...
Co-authored-by: John Joyce <john@acryl.io>
2023-04-20 22:03:36 -07:00
Harshal Sheth
f37ca4e49c
docs(ingest): fix CorpGroup example ( #7816 )
2023-04-20 21:09:12 -07:00
Harshal Sheth
66f44945e3
docs(ingest): update dbt and aws docs ( #7870 )
2023-04-20 21:08:22 -07:00
Andrew Sikowitz
1ff6949e36
refactor(ingest): Add helper DataHubGraph methods ( #7851 )
...
Adds:
- get_urns_by_filter(), using scroll by entities
- get_latest_pipeline_checkpoint()
- soft_delete_urn()
2023-04-20 10:16:33 -07:00
Aseem Bansal
535e1abe44
chore(ci): fix CI failing due to lint ( #7863 )
2023-04-20 16:53:36 +05:30
Harshal Sheth
6802142f6e
fix(ingest/salesforce): use report timestamp for operations ( #7838 )
...
Co-authored-by: John Joyce <john@acryl.io>
2023-04-19 20:39:07 -07:00
Hyejin Yoon
e5d06733f2
feat(docs): consolidate api guides ( #7857 )
...
Co-authored-by: socar-dini <dini@socar.kr>
2023-04-20 12:17:11 +09:00
Hyejin Yoon
ea4036c1c8
feat: enriching guide on creating dataset ( #7777 )
...
Co-authored-by: Hyejin Yoon <hyejin.yoon@acryl.io>
Co-authored-by: socar-dini <dini@socar.kr>
2023-04-19 12:58:03 +09:00
Harshal Sheth
f0ea79060b
chore(ingest): bug fix in sqlparse pin ( #7848 )
2023-04-18 16:05:23 -07:00
Harshal Sheth
cf7eb570a0
fix(ingest): pin sqlparse version ( #7847 )
2023-04-18 14:25:42 -07:00
John Joyce
b46822399c
feat(timeseries): Support sorting timeseries aspects by non-timestampMillis field + fix operations resolver ( #7840 )
2023-04-18 09:10:04 -07:00
Harshal Sheth
399e3333ad
feat(cli): improve quickstart stability ( #7839 )
2023-04-17 21:19:19 -07:00
Harshal Sheth
e461d03d94
feat(ingest/unity): capture create/lastModified timestamps ( #7819 )
...
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
2023-04-17 12:18:21 -07:00
Mayuri Nehate
a8681dae75
fix(ingest/snowflake): fix column name in snowflake optimised lineage ( #7834 )
2023-04-17 11:44:53 -07:00
Harshal Sheth
af566e1184
feat(model): fully populate the entity registry ( #7818 )
2023-04-15 13:33:05 -07:00
Harshal Sheth
342830c68c
fix(cli): use correct ingestion image in script ( #7826 )
2023-04-14 23:47:08 -07:00
Andrew Sikowitz
1ac1ccf26e
perf(ingest/bigquery): Improve bigquery usage disk usage and speed ( #7825 )
2023-04-14 18:09:43 -07:00
Andrew Sikowitz
e839ac4c40
fix(ingest/bigquery): Handle null values from usage aggregation ( #7827 )
2023-04-14 16:54:22 -07:00
Mayuri Nehate
8ec74ce41c
fix(ingest/bigquery): update usage query, remove erroneous init ( #7811 )
2023-04-14 13:38:50 -07:00
Andrew Sikowitz
37e7485184
fix(ingest/bigquery): Do not query columns when not ingesting tables or views ( #7823 )
2023-04-14 09:08:22 -07:00
Andrew Sikowitz
408cd7db2a
fix(ingest/bigquery): Enable lineage and usage ingestion without tables ( #7820 )
2023-04-14 01:41:00 -07:00
Andrew Sikowitz
d8d8176b1a
fix(ingest/bigquery): Add to lineage, not overwrite, when using sql parser ( #7814 )
2023-04-14 08:46:10 +02:00
Tamas Nemeth
4ec280ee20
fix(ingest/redshift): Remove pg_user table from metadata queries ( #7815 )
2023-04-13 15:35:26 -07:00
Andrew Sikowitz
ce795406b9
feat(ingest): Track disk usage in report ( #7812 )
2023-04-13 14:43:25 -07:00