1138 Commits

Author SHA1 Message Date
IceS2
5b20b84546
MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes
2025-06-03 17:35:08 +02:00
Suman Maharana
720c6d3f9f
Add: Looker explore to view Column Lineage (#21504)
* Add: explore to view Column Lineage

* Add tags ingestion and fix cll warnings

* lint

* Addressed comments

* fixed tests
2025-06-03 20:23:43 +05:30
Pere Menal-Ferrer
6683c632f4
FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-30 08:18:50 +02:00
Teddy
2a120c166a
MINOR: Py failing test cases (#21437)
* fix: failing test cases

* fix: skip test for now
2025-05-28 17:52:32 +02:00
harshsoni2024
8bbc4d8c3d
MINOR: PBI dataset expressions empty value fix (#21409) 2025-05-27 16:50:55 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
harshsoni2024
a414e93163
bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272) 2025-05-19 19:21:02 +05:30
Suman Maharana
eb371bca12
Fix : Tableau e2e extra params (#21237) 2025-05-19 18:35:40 +05:30
Mohit Tilala
4c0ce77756
Fix airbyte pipeline lineage extraction (#21151) 2025-05-19 10:14:33 +05:30
Mayur Singal
703118f2b5
MINOR: Disable Flaky superset tests (#21242) 2025-05-18 23:12:42 +05:30
Pere Menal-Ferrer
a7e2f33adc
feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-16 14:03:49 +02:00
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
Teddy
cd6434dd73
ISSUE #21146 - Properly handle connection on sampler (#21186)
* fix: properly close connection on sampler ingestion

* fix: dangling connection test

* style: ran python linting

* fix: revert to 9
2025-05-15 12:21:01 +02:00
harshsoni2024
3b382c1bd9
issue-20737: datalake parquet different extensions (#21048) 2025-05-13 11:23:46 +05:30
Teddy
a853561d30
MINOR: data sample ingestion bigquery (#21074)
* fix: data sample ingestion bigquery

* style: ran python linting

* fix: flaky test in topology
2025-05-06 15:58:37 +02:00
Mayur Singal
9755662240
Fix #20902: Fix duplicate constraints error (#21037) 2025-04-30 11:34:35 +05:30
Teddy
63a55437ae
GEN-1412: Implement load test logic (#19155)
* feat: implemented load test logic

* style: ran python linting

* fix: added locust dependency in test

* fix: skip locust in 3.8 as not supported

* fix: update gcsfs version

* fix: revert gcsfs versionning

* fix: fix gcsf version to 2023.10

* fix: dagster graphql and gx versions

* fix: dagster version to 1.8 for py8 compatibility

* fix: fix clickhouse to 0.2 as 0.3 requires SQA 2+

* fix: revert changes from main

* fix: revert changes compared to main
2025-04-24 16:08:38 +02:00
Teddy
209793f315
MINOR - Add support for GX 1.4 (#20934)
* fix: add support for GX 0.18.22 and GX 1.4.x

* fix: add  support for GX 0.18.22 and GX 1.4.x

* style: ran python linting

* fix: skip test if GX version is not installed
2025-04-24 11:55:04 +02:00
harshsoni2024
17dd182cbb
e2e fix (#20952) 2025-04-24 15:23:43 +05:30
Keshav Mohta
1063e019ba
Fixes: Bigquery E2E (#20863) 2025-04-17 11:43:14 +05:30
Keshav Mohta
1a6224824b
Fixes: BQ Multiple Project E2E (#20797)
* fix: bq e2e lineage and counts

* fix: bigquery multiple project classify

* fix: tests count from 19 to 17
2025-04-15 17:35:22 +05:30
Teddy
1edeb0baf8
MINOR: classification + test workflow for BQ multiproject (#20779)
* fix: classification + test workflow for BQ multiproject

* fix: deleted e2e test as handled from the UI

* fix: failing test case
2025-04-15 10:37:29 +02:00
Mayur Singal
40ab1814c0
MINOR: Always Include DDL for Views (#20784) 2025-04-15 12:59:50 +05:30
chrisrayrayne
b14f83940a
Fixes Issue 20189: REST connector checks updated (#20736) 2025-04-15 10:24:57 +05:30
Pere Miquel Brull
c38209c63b
FIX CL-#1427 - PATCH applies inherited owners (#20759)
* FIX CL-#1427 - PATCH applies inherited owners

* FIX CL-#1427 - PATCH applies inherited owners

* format
2025-04-13 06:56:33 +02:00
Mayur Singal
4a407f6d0d
MINOR: Implement column validation in lineage patch api (#20545) 2025-04-07 21:24:46 +05:30
Pere Miquel Brull
3186937cc2
MINOR - Update Auto Classification defaults for sample data & classif… (#20587)
* MINOR - Update Auto Classification defaults for sample data & classification

* fix tests
2025-04-07 15:56:57 +02:00
Mayur Singal
ee5d8eee8b
Revert "MINOR: Implement Column Validation in Lineage (#20544)" (#20658) 2025-04-07 17:13:35 +05:30
Keshav Mohta
0796c6274b
Fixes: Databricks httpPath Required (#20611)
* fix: made databricks httpPath required and added a migration file for the same

* fix: added sql migration in postDataMigration file and fix databricks tests

* fix: added httpPath in test_source_connection.py and test_source_parsing.py files

* fix: added httpPath in test_databricks_lineage.py

* fix: table name in postgres migration
2025-04-07 13:33:55 +05:30
harshsoni2024
7953f98097
issue-20546: REST connector enhancements (#20634) 2025-04-07 10:22:45 +05:30
Imri Paran
f6441ad404
fix: trino data diff paths (#20457)
requires https://github.com/open-metadata/collate-data-diff/pull/6
2025-04-03 15:48:10 +02:00
Suman Maharana
5275975d31
Fix: dbt cloud latest run execution (#20573)
* Fix: dbt cloud latest run execution

* update latest run id

* set default to 100
2025-04-03 11:13:17 +05:30
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Mayur Singal
7991715135
MINOR: Implement Column Validation in Lineage (#20544) 2025-04-02 17:40:40 +05:30
harshsoni2024
f267d4ef01
issue-20519: Support PowerBI Owners ingestion (#20525) 2025-04-02 16:11:27 +05:30
Imri Paran
663839bd85
test: assert dangling db connections (#20458)
added dangling connection assertions for mysql integration test
2025-04-02 08:38:17 +02:00
Mohit Tilala
06ab82170b
Fixes #19534: Snowflake stream ingestion support (#20278) 2025-04-01 13:02:37 +05:30
Mohit Tilala
7ad97afa62
Fixes #19690: Add QlikCloud dashboard filter by space name type (#20315) 2025-04-01 13:00:50 +05:30
Pere Miquel Brull
c08273b4ad
MINOR: Allow loading ometa from env (#20511) 2025-03-31 12:06:33 +02:00
Mayur Singal
e6b7b89f86
Fix #20236: Handle Sample Data with non-utf8 characters (#20380) 2025-03-27 14:20:26 +05:30
Ayush Shah
7a3990f350
Fixes 19119: Enhance TableCustomSQLQueryValidator to support threshold operation (#20307) 2025-03-27 13:11:56 +05:30
Ayush Shah
653c878497
MINOR: Transform Reserved keywords like quotes to OM compatible (#20459) 2025-03-27 13:02:07 +05:30
Ayush Shah
60974e4ea1
Revert "Fixes #17660: Oracle handle quotes for lowercase columns in workflow agents (#20309)" (#20364) 2025-03-20 21:02:58 +05:30
Mayur Singal
fb3ba391ff
MINOR: Fix failing pytest (#20332) 2025-03-19 12:35:37 +05:30
Sriharsha Chintalapani
706cebd97a
Opensearch connector (#19698)
* Fix #19667: OpenSearch Connector

* Fix #19667: OpenSearch Connector

* do not ingest any system level indexes

* fix pyformat

* Add AWS auth

* Use common schema and fix ssl config in client

* Add openseach connector docs and update schema

* Remove api key auth type and complete docs checklist

* Remove unnecessary httpx dependency and pyformat

* Add compatible version of httpx for elasticsearch

* Fix pylint fails and py-tests validation error

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
2025-03-18 18:45:25 +05:30
Ayush Shah
20ab64d1f1
Fixes #17660: Oracle handle quotes for lowercase columns in workflow agents (#20309) 2025-03-18 15:48:58 +05:30