1128 Commits

Author SHA1 Message Date
IceS2
41dfdff43e MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes

(cherry picked from commit 5b20b845462cfcb568b92dbf22e160226433fae5)
2025-06-03 15:36:28 +00:00
Pere Menal-Ferrer
65b0e9d9ee FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
(cherry picked from commit 6683c632f42b55a16db9df8acf5fd96b586ce301)
2025-05-30 06:20:12 +00:00
harshsoni2024
a8aa1d004f MINOR: PBI dataset expressions empty value fix (#21409)
(cherry picked from commit 8bbc4d8c3d7792d0d36a163f119efdccf61840e5)
2025-05-27 11:22:08 +00:00
Pere Menal-Ferrer
6c5c9088ea fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
(cherry picked from commit 3c6c762d9c0d7036124aae3a4dc90f51d6a674c0)
2025-05-20 13:33:37 +00:00
Pere Menal-Ferrer
e1b7e93fe0 feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-20 09:28:30 +02:00
Pere Menal-Ferrer
601c56a3cf feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 09:28:16 +02:00
harshsoni2024
1e2e2e078b bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272)
(cherry picked from commit a414e93163a9f6001fd90d298128cecc3a6f60b8)
2025-05-19 13:52:13 +00:00
Teddy
e032b0497c ISSUE #21146 - Properly handle connection on sampler (#21186)
* fix: properly close connection on sampler ingestion

* fix: dangling connection test

* style: ran python linting

* fix: revert to 9

(cherry picked from commit cd6434dd73cd7c60ef22a11972740f12686b3558)
2025-05-19 09:04:03 +02:00
Mohit Tilala
5f8e2d94b0 Fix airbyte pipeline lineage extraction (#21151)
(cherry picked from commit 4c0ce77756f07a7addce489f3cae9d3c9ff788d5)
2025-05-19 04:45:46 +00:00
Suman Maharana
3e68352936 Chore Ingestion Tableau library change (#21076)
(cherry picked from commit f81ee52ec4942a206ed3ada581a67826384b548a)
2025-05-15 12:19:49 +00:00
harshsoni2024
a0195a82a5 issue-20737: datalake parquet different extensions (#21048) 2025-05-15 10:43:23 +05:30
Teddy
83d2811c07 MINOR: data sample ingestion bigquery (#21074)
* fix: data sample ingestion bigquery

* style: ran python linting

* fix: flaky test in topology

(cherry picked from commit a853561d30fcab34796a06c0e75ec0bb4f20e1f4)
2025-05-06 14:00:00 +00:00
Mayur Singal
d7bdc1bdc4 Fix #20902: Fix duplicate constraints error (#21037)
(cherry picked from commit 9755662240a59a94a169cc02405042dee4fa7b88)
2025-04-30 06:05:45 +00:00
Keshav Mohta
1063e019ba
Fixes: Bigquery E2E (#20863) 2025-04-17 11:43:14 +05:30
Keshav Mohta
1a6224824b
Fixes: BQ Multiple Project E2E (#20797)
* fix: bq e2e lineage and counts

* fix: bigquery multiple project classify

* fix: tests count from 19 to 17
2025-04-15 17:35:22 +05:30
Teddy
1edeb0baf8
MINOR: classification + test workflow for BQ multiproject (#20779)
* fix: classification + test workflow for BQ multiproject

* fix: deleted e2e test as handled from the UI

* fix: failing test case
2025-04-15 10:37:29 +02:00
Mayur Singal
40ab1814c0
MINOR: Always Include DDL for Views (#20784) 2025-04-15 12:59:50 +05:30
chrisrayrayne
b14f83940a
Fixes Issue 20189: REST connector checks updated (#20736) 2025-04-15 10:24:57 +05:30
Pere Miquel Brull
c38209c63b
FIX CL-#1427 - PATCH applies inherited owners (#20759)
* FIX CL-#1427 - PATCH applies inherited owners

* FIX CL-#1427 - PATCH applies inherited owners

* format
2025-04-13 06:56:33 +02:00
Mayur Singal
4a407f6d0d
MINOR: Implement column validation in lineage patch api (#20545) 2025-04-07 21:24:46 +05:30
Pere Miquel Brull
3186937cc2
MINOR - Update Auto Classification defaults for sample data & classif… (#20587)
* MINOR - Update Auto Classification defaults for sample data & classification

* fix tests
2025-04-07 15:56:57 +02:00
Mayur Singal
ee5d8eee8b
Revert "MINOR: Implement Column Validation in Lineage (#20544)" (#20658) 2025-04-07 17:13:35 +05:30
Keshav Mohta
0796c6274b
Fixes: Databricks httpPath Required (#20611)
* fix: made databricks httpPath required and added a migration file for the same

* fix: added sql migration in postDataMigration file and fix databricks tests

* fix: added httpPath in test_source_connection.py and test_source_parsing.py files

* fix: added httpPath in test_databricks_lineage.py

* fix: table name in postgres migration
2025-04-07 13:33:55 +05:30
harshsoni2024
7953f98097
issue-20546: REST connector enhancements (#20634) 2025-04-07 10:22:45 +05:30
Imri Paran
f6441ad404
fix: trino data diff paths (#20457)
requires https://github.com/open-metadata/collate-data-diff/pull/6
2025-04-03 15:48:10 +02:00
Suman Maharana
5275975d31
Fix: dbt cloud latest run execution (#20573)
* Fix: dbt cloud latest run execution

* update latest run id

* set default to 100
2025-04-03 11:13:17 +05:30
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Mayur Singal
7991715135
MINOR: Implement Column Validation in Lineage (#20544) 2025-04-02 17:40:40 +05:30
harshsoni2024
f267d4ef01
issue-20519: Support PowerBI Owners ingestion (#20525) 2025-04-02 16:11:27 +05:30
Imri Paran
663839bd85
test: assert dangling db connections (#20458)
added dangling connection assertions for mysql integration test
2025-04-02 08:38:17 +02:00
Mohit Tilala
06ab82170b
Fixes #19534: Snowflake stream ingestion support (#20278) 2025-04-01 13:02:37 +05:30
Mohit Tilala
7ad97afa62
Fixes #19690: Add QlikCloud dashboard filter by space name type (#20315) 2025-04-01 13:00:50 +05:30
Pere Miquel Brull
c08273b4ad
MINOR: Allow loading ometa from env (#20511) 2025-03-31 12:06:33 +02:00
Mayur Singal
e6b7b89f86
Fix #20236: Handle Sample Data with non-utf8 characters (#20380) 2025-03-27 14:20:26 +05:30
Ayush Shah
7a3990f350
Fixes 19119: Enhance TableCustomSQLQueryValidator to support threshold operation (#20307) 2025-03-27 13:11:56 +05:30
Ayush Shah
653c878497
MINOR: Transform Reserved keywords like quotes to OM compatible (#20459) 2025-03-27 13:02:07 +05:30
Ayush Shah
60974e4ea1
Revert "Fixes #17660: Oracle handle quotes for lowercase columns in workflow agents (#20309)" (#20364) 2025-03-20 21:02:58 +05:30
Mayur Singal
fb3ba391ff
MINOR: Fix failing pytest (#20332) 2025-03-19 12:35:37 +05:30
Sriharsha Chintalapani
706cebd97a
Opensearch connector (#19698)
* Fix #19667: OpenSearch Connector

* Fix #19667: OpenSearch Connector

* do not ingest any system level indexes

* fix pyformat

* Add AWS auth

* Use common schema and fix ssl config in client

* Add openseach connector docs and update schema

* Remove api key auth type and complete docs checklist

* Remove unnecessary httpx dependency and pyformat

* Add compatible version of httpx for elasticsearch

* Fix pylint fails and py-tests validation error

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
2025-03-18 18:45:25 +05:30
Ayush Shah
20ab64d1f1
Fixes #17660: Oracle handle quotes for lowercase columns in workflow agents (#20309) 2025-03-18 15:48:58 +05:30
fuzmish
7fa3e53403
Fix: Pass raw value of extraHeaders to ClientConfig (#19989) 2025-03-18 13:55:51 +05:30
harshsoni2024
dba37820d7
MINOR: e2e fixes (#20301) 2025-03-17 21:00:26 +05:30
Akash Verma
cf7a442e32
Fixes #19891 : Added measures in powerbi (#19990) 2025-03-17 14:43:22 +05:30
Mayur Singal
d30fd90096
Minor: Query Cost Table Aggregation Endpoint (#20270) 2025-03-17 11:33:50 +05:30
Mayur Singal
581ab6ce71
MINOR: Fix pytests - usage count (#20247) 2025-03-14 09:07:40 +01:00
harshsoni2024
9bf1ce53ec
MINOR: fix-e2e-tests (#20233) 2025-03-13 20:32:06 +05:30
harshsoni2024
826279608f
issue-19892: parse powerbi table source (#20141) 2025-03-12 12:59:29 +05:30
harshsoni2024
aedbe8be2d
fix pbi, vertica, metabase tests (#20190) 2025-03-11 16:40:56 +01:00
Pere Miquel Brull
2e7c9a0875
FIX #19765 - Improve Column Name Scanner (#20136) 2025-03-07 14:32:59 +01:00
harshsoni2024
40a9c67875
Day 1 - Dashboard service lineage without db_service_name (#19911) 2025-03-07 11:16:58 +05:30