3975 Commits

Author SHA1 Message Date
Hyejin Yoon
72aab9fe63
feat(sdk): add sdk lineage client (#13244)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2025-04-17 17:44:07 +09:00
Hyejin Yoon
4e37202373
docs(mlflow): add docs on version requirement for mlflow (#13251) 2025-04-17 15:13:13 +09:00
Sergio Gómez Villamor
f3dfd86680
fix(looker): missing Looker Explore relationship when Look references multiple models (#13198) 2025-04-17 07:53:13 +02:00
Andrew Sikowitz
d138a64a6a
ci(graphql,workflows): Format .md, .graphql, and workflow .yml files via prettier (#13220) 2025-04-16 16:55:51 -07:00
Tamas Nemeth
5edd41c4bf
doc(ingestion/gc): Add doc for GC source (#12296) 2025-04-16 11:01:44 +02:00
Hyejin Yoon
6cc3fff57f
fix(ingest/mlflow): pin mlflow-skinny version (#13208) 2025-04-16 08:27:31 +09:00
ryota-cloud
e79445e469
fix(metadata-ingestion) update vertexAI source doc with permissions detail (#13219) 2025-04-15 13:37:26 -07:00
Hyejin Yoon
bafd93d38f
feat(sdk): add mlmodel and mlmodelgroup (#13150) 2025-04-15 16:12:38 +09:00
Sergio Gómez Villamor
60b769fbf6
feat(ariflow-plugin): ability to disable datajob lineage (#13187) 2025-04-14 16:15:19 +02:00
Andrew Sikowitz
3e37f76428
feat(ingest/tableau): Allow specifying asset types for ingest_hidden_assets (#13190) 2025-04-11 20:07:37 -07:00
Gabe Lyons
1bcdda740d
feat(data contracts): supporting structured properties on data contracts (#13176) 2025-04-11 08:30:20 -07:00
Tamas Nemeth
e048cf7ce7
fix(ingest/sigma): Fix missing key in workspace_counts (#13182) 2025-04-11 15:28:43 +02:00
ryota-cloud
ca4eab52e4
VertexAI Connector (v3 - pipeline and pipeline task) (#12960) 2025-04-10 22:54:17 -07:00
Andrew Sikowitz
ca51df880f
fix(ingest/snowflake): Use CREATE change type when creating structured properties; support MCP headers (#13158) 2025-04-10 15:13:13 -05:00
Michael Minichino
0b105395e9
feat(ingest/powerbi): Support ODBC Data Source (#13090) 2025-04-10 08:57:37 -05:00
Hyejin Yoon
443134ca96
fix(ingest/mlflow): skip experiment/run ingestion for older version of mlflow (#13122) 2025-04-10 14:09:09 +09:00
Harshal Sheth
9f0c0aa3dd
refactor(ingest/sigma): make some error cases more clear (#13110)
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
2025-04-09 12:03:17 -07:00
skrydal
7b6ab3ba15
fix(ingest): Make workunit processor ensuring schema size more aggressive (#13153) 2025-04-09 12:02:32 -07:00
Harshal Sheth
e4a8c77344
fix(ingest): quote db name in streams query (#13131) 2025-04-09 09:03:36 -07:00
Sergio Gómez Villamor
e7d8f2913c
fix(snowflake): fixes deduplication and fingerprint requirements for Hex (#13121) 2025-04-09 10:17:43 +02:00
Sergio Gómez Villamor
75894399f0
fix(hex): fixes AccessType model (#13123) 2025-04-09 09:04:05 +02:00
Aseem Bansal
2a75a981ca
chore(ci): upgrade ruff version (#13125) 2025-04-09 11:24:07 +05:30
Harshal Sheth
1fca9855ee
fix(ingest/snowflake): fix error on stored procs in non-SQL languages (#13127) 2025-04-08 17:00:56 -07:00
Sergio Gómez Villamor
5c7b8e10ce
fix(hex): filter out queries if non scheduled runs (#13126) 2025-04-08 20:55:28 +02:00
Sergio Gómez Villamor
7dd4f06e71
docs(hex): additional limitations (#13103) 2025-04-08 07:59:21 +02:00
Sergio Gómez Villamor
4e48e098dc
fix(ingestion): fixes missing platform instance aspect for DataFlow entitiy (#13080) 2025-04-06 08:19:47 +02:00
skrydal
38f1553315
feat(ingestion): Refactoring timestamping logic for WorkUnits + custom logic for Iceberg (#13030)
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
2025-04-04 22:30:27 +02:00
Pedro Silva
a4b343cc82
fix(ingest/delta-lake): Bump delta-lake dependency (#12766)
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
2025-04-04 12:19:13 -07:00
Tamas Nemeth
df119cea1a
fix(ingest/mlflow): Fix stateful ingestion setup (#13084) 2025-04-04 18:47:19 +02:00
Tamas Nemeth
250b100a93
doc(ingestion/s3): Document permissions requirements for s3 source (#12816)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2025-04-04 18:28:42 +02:00
Sergio Gómez Villamor
b37fa03846
chore: fixes SIM118 ruff rule (#13069) 2025-04-04 11:59:43 +02:00
Rasnar
38e240e916
feat(ingest/airflow): platform_instance support in Airflow plugin (#12751)
Co-authored-by: rasnar <11248833+Rasnar@users.noreply.github.com>
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
2025-04-04 09:26:58 +02:00
Harshal Sheth
4d53df63a2
fix(ingest/sigma): include workspace names in report (#13055) 2025-04-03 09:09:44 -07:00
Sergio Gómez Villamor
d2bb33f7c5
feat(ingest): new hex connector - part 2 (#12985) 2025-04-03 12:44:37 +02:00
Peter Wang
719cc67cac
feat(ingest/superset): leverage threads for superset API calls (#13006)
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
2025-04-02 10:27:57 -07:00
skrydal
ff799c9370
feat(ingestion/iceberg): source lastModified from table metadata field (#13052) 2025-04-02 12:05:41 +02:00
Sergio Gómez Villamor
e072a42d03
feat(ingest): adds get_entities_v3 method to DataHubGraph (#13045) 2025-04-02 10:22:14 +02:00
Harshal Sheth
18aa1f076d
fix(ingest/trino): always use table properties fallback (#13048) 2025-04-01 15:35:23 -07:00
Kevin Karch
d75de77d6b
docs(ingest): clarify snowflake key language (#13050) 2025-04-01 14:59:51 -04:00
Hugo Hobson
b394ae6350
docs(ingest): make fail_safe_threshold config visable in docs (#13017) 2025-04-01 12:49:13 +01:00
Hugo Hobson
acc84c2459
fix(cli): stop deployment config being overwritten by cli defaults (#13036)
`executor_id` and `time_zone` values set in the `deployment` block of a recipe are not being used by `datahub ingest deploy` cli command when deploying recipes. This is because the [cli values take precedence over deployment config](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/utilities/ingest_utils.py#L63), so where the cli has default values these as always used.

Default values should be set in [`DeployOptions`](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/utilities/ingest_utils.py#L23), not in the cli options.
2025-04-01 10:21:30 +01:00
Sergio Gómez Villamor
c6acce9906
feat(powerbi): capture dataset report lineage (#12993) 2025-04-01 11:12:47 +02:00
ryota-cloud
ebea3b7ca3
fix(ingestion) create ExperimentKey instead of containerKeyId used in MLflow and Vertex AI (#12995) 2025-04-01 00:00:09 -07:00
Hyejin Yoon
9e28c1af63
docs(mlflow): add docs for the mlflow dataset config (#12973) 2025-04-01 12:20:32 +09:00
Harshal Sheth
c79192090d
feat(ingest): propagate backpressure in ThreadedIteratorExecutor (#13027) 2025-03-31 10:23:00 -07:00
Harshal Sheth
58169ad7cc
fix(sdk): fix bugs in v2 sdk search client (#13026) 2025-03-31 08:59:32 -07:00
Harshal Sheth
2ef0086394
feat(ingest): allow sources to produce sdk entities (#13028) 2025-03-31 08:33:17 -07:00
Gabe Lyons
ee4827e1b2
fix(oracle): fixing oracle CLL for view parsing. (#13029) 2025-03-30 07:42:55 -07:00
Chakru
d2dd54acf1
feat(dataset_cli): add dry-run support (#12814) 2025-03-29 18:08:52 +05:30
Harshal Sheth
3e4d14734b
feat(ingest/sigma): add reporting on filtered workspaces (#12998) 2025-03-28 16:58:34 -07:00