1157 Commits

Author SHA1 Message Date
ulixius9
b960f87814 MINOR: Postgres Implement schema fallback 2025-06-19 15:38:06 +05:30
Suman Maharana
7be62f3ed9
Add: Tableau Hierarchy project filter (#21811) 2025-06-19 11:18:52 +05:30
IceS2
040a33117c
MINOR: Fix Profiler Infinite Loop (#21843) 2025-06-19 10:33:45 +05:30
harshsoni2024
d38ee0ed52
feat-21712: PowerBI internal entities & cross workspace lineage (#21837) 2025-06-18 20:46:17 +05:30
Keshav Mohta
7c0eeef049
Fixes #19692: Implemented Nifi Pipeline Lineage (#21802)
* feat: implemented nifi pipeline lineage

* test: implemented tests for nifi pipeline lineage

* fix: yield_pipeline_bulk_lineage_details output type hinting

* fix: component check in connections

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-06-18 07:31:04 +00:00
harshsoni2024
a09a696358
MINOR: Tableau proxy url for sourceurl (#21799) 2025-06-18 10:52:08 +05:30
Mayur Singal
34c43eaea0
MINOR: Fix pytests (#21807) 2025-06-17 23:44:29 +05:30
IceS2
e79c54e6a5
MINOR: Add injection to profiler (#21738)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Add Injection to Metrics in Profiler

* Add Injection to the Profiler

* Fix UnitTests

* Fix Pytests

* Fix Tests

* Fix types
2025-06-17 19:01:00 +02:00
harshsoni2024
0f79d8ea1d
MINOR: pytest opt out flaky test (#21800)
* remove mlflow test until fixed

* alationsink test count fixed

* pylint fix gx
2025-06-17 14:23:28 +05:30
IceS2
49df5fc9de
MINOR: Implement dependency injection on ingestion (#21719)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Fix test, making the injection test run last

* Update connections.py

* Changed NewType to an AbstractClass to avoid linting issues

* remove comment

* Fix bug in service spec

* Update PyTest version to avoid importlib.reader wrong import
2025-06-16 08:03:38 +02:00
Pere Menal-Ferrer
44e09e41a2
Revert "FIX #1464 (#21520)" (#21726)
This reverts commit 1e86f9870fd663122b9bbb64f3cf17cf32619c7f.
2025-06-13 17:27:32 +02:00
IceS2
891ff4184d
MINOR: Initial implementation for our Connection Class (#21581)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Fix Test

* Fix Profile Test Connection

* Remove unit test

* Remove comment

* Fix tests and missing changes
2025-06-13 14:52:29 +02:00
harshsoni2024
6a6180b2e3
powerbi change owner condition (#21724) 2025-06-12 16:11:43 +05:30
Suman Maharana
18f9f2cdb6
Fix: Tableau project id should always be a string (#21700) 2025-06-12 11:21:53 +05:30
Teddy
c09a8b27ae
ISSUE #16676 - Add Tag to CreateTestCase (#21366)
* refactor: removed testSuite field from CreateTestCase

BREAKING CHANGE: when creating a test case, testsuite is now derived from entityLink (fetch or created)

* feat: allow setting tags when creating a test case

* style: ran linters

* fix: compiling error

* fix: failing test case

* fix: failing tests

* removed testSuite from required filed

* fixed ui side

* style: ran java linting

* deprecation: remove testSuite param from ingestion

* fix: remove test suite filed

* fix: remove test_suite field

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-06-11 09:59:08 +02:00
Suman Maharana
0df058a53d
Fix: dbtcloud CL errors (#21685) 2025-06-10 21:45:07 +05:30
Pere Menal-Ferrer
1e86f9870f
FIX #1464 (#21520)
* Add PIICategoryTags and some utilities on top of them.

* Fix static-check

* Add test for fqn representation

* Add NEREntityGeneralTags.json from Collate

* Add test to check PIICategoryTags agree with the ones used by OM server

* Add LabelExtractor

* Fix style

* Add ignore superflous-parens for pylint

* Ass comment as per PR review

* Fix not-updated PII-IT

* Remove duplicated IT test for PII

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-09 16:05:35 -07:00
Keshav Mohta
b7a7023890
Fix #20665: BigQuery - Adding billing project (#21231) 2025-06-09 13:09:40 +05:30
Teddy
5078a2fbb9
DEPRECATION: Remove testCaseResults endpoint from testCaseResource (#21527)
* deprecation: remove testCaseResults endpoint from testCaseResource

* fix: path in test e2e test

* fix: endpoint name to testCaseResults

* style: fix java linting
2025-06-07 21:02:54 +02:00
IceS2
5b20b84546
MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes
2025-06-03 17:35:08 +02:00
Suman Maharana
720c6d3f9f
Add: Looker explore to view Column Lineage (#21504)
* Add: explore to view Column Lineage

* Add tags ingestion and fix cll warnings

* lint

* Addressed comments

* fixed tests
2025-06-03 20:23:43 +05:30
Pere Menal-Ferrer
6683c632f4
FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-30 08:18:50 +02:00
Teddy
2a120c166a
MINOR: Py failing test cases (#21437)
* fix: failing test cases

* fix: skip test for now
2025-05-28 17:52:32 +02:00
harshsoni2024
8bbc4d8c3d
MINOR: PBI dataset expressions empty value fix (#21409) 2025-05-27 16:50:55 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
harshsoni2024
a414e93163
bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272) 2025-05-19 19:21:02 +05:30
Suman Maharana
eb371bca12
Fix : Tableau e2e extra params (#21237) 2025-05-19 18:35:40 +05:30
Mohit Tilala
4c0ce77756
Fix airbyte pipeline lineage extraction (#21151) 2025-05-19 10:14:33 +05:30
Mayur Singal
703118f2b5
MINOR: Disable Flaky superset tests (#21242) 2025-05-18 23:12:42 +05:30
Pere Menal-Ferrer
a7e2f33adc
feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-16 14:03:49 +02:00
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
Teddy
cd6434dd73
ISSUE #21146 - Properly handle connection on sampler (#21186)
* fix: properly close connection on sampler ingestion

* fix: dangling connection test

* style: ran python linting

* fix: revert to 9
2025-05-15 12:21:01 +02:00
harshsoni2024
3b382c1bd9
issue-20737: datalake parquet different extensions (#21048) 2025-05-13 11:23:46 +05:30
Teddy
a853561d30
MINOR: data sample ingestion bigquery (#21074)
* fix: data sample ingestion bigquery

* style: ran python linting

* fix: flaky test in topology
2025-05-06 15:58:37 +02:00
Mayur Singal
9755662240
Fix #20902: Fix duplicate constraints error (#21037) 2025-04-30 11:34:35 +05:30
Teddy
63a55437ae
GEN-1412: Implement load test logic (#19155)
* feat: implemented load test logic

* style: ran python linting

* fix: added locust dependency in test

* fix: skip locust in 3.8 as not supported

* fix: update gcsfs version

* fix: revert gcsfs versionning

* fix: fix gcsf version to 2023.10

* fix: dagster graphql and gx versions

* fix: dagster version to 1.8 for py8 compatibility

* fix: fix clickhouse to 0.2 as 0.3 requires SQA 2+

* fix: revert changes from main

* fix: revert changes compared to main
2025-04-24 16:08:38 +02:00
Teddy
209793f315
MINOR - Add support for GX 1.4 (#20934)
* fix: add support for GX 0.18.22 and GX 1.4.x

* fix: add  support for GX 0.18.22 and GX 1.4.x

* style: ran python linting

* fix: skip test if GX version is not installed
2025-04-24 11:55:04 +02:00
harshsoni2024
17dd182cbb
e2e fix (#20952) 2025-04-24 15:23:43 +05:30
Keshav Mohta
1063e019ba
Fixes: Bigquery E2E (#20863) 2025-04-17 11:43:14 +05:30
Keshav Mohta
1a6224824b
Fixes: BQ Multiple Project E2E (#20797)
* fix: bq e2e lineage and counts

* fix: bigquery multiple project classify

* fix: tests count from 19 to 17
2025-04-15 17:35:22 +05:30
Teddy
1edeb0baf8
MINOR: classification + test workflow for BQ multiproject (#20779)
* fix: classification + test workflow for BQ multiproject

* fix: deleted e2e test as handled from the UI

* fix: failing test case
2025-04-15 10:37:29 +02:00
Mayur Singal
40ab1814c0
MINOR: Always Include DDL for Views (#20784) 2025-04-15 12:59:50 +05:30
chrisrayrayne
b14f83940a
Fixes Issue 20189: REST connector checks updated (#20736) 2025-04-15 10:24:57 +05:30
Pere Miquel Brull
c38209c63b
FIX CL-#1427 - PATCH applies inherited owners (#20759)
* FIX CL-#1427 - PATCH applies inherited owners

* FIX CL-#1427 - PATCH applies inherited owners

* format
2025-04-13 06:56:33 +02:00
Mayur Singal
4a407f6d0d
MINOR: Implement column validation in lineage patch api (#20545) 2025-04-07 21:24:46 +05:30
Pere Miquel Brull
3186937cc2
MINOR - Update Auto Classification defaults for sample data & classif… (#20587)
* MINOR - Update Auto Classification defaults for sample data & classification

* fix tests
2025-04-07 15:56:57 +02:00