3385 Commits

Author SHA1 Message Date
Elay Gelbart
dec346a84b
Fixes ISSUE 20899: upgrade google-cloud-secret-manager python requirement version (#20900)
* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3

* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3 with ~

* Bump up `mlflow` and `databricks-sdk` for protobuf 5.x.x, pin down google-cloud-secret-manager to 2.22.1 for airflow deps sync

* Pin down databricks-sdk to 0.20.0

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2025-06-06 03:14:25 +05:30
IceS2
8540884ab1
MINOR: Add method to filter ingestion pipeline based on metadata (#21449)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes

* Only dump non-null keys

* Add Method to Filter Ingestion Pipeline based on Metadata

* Reduce the scope to filter only specifically on metadata->workflow->serviceName
2025-06-04 16:13:39 +02:00
Mohit Tilala
44c90557b7
Fix missing __pydantic_fields__ exceptions (#21521) 2025-06-04 16:44:31 +05:30
IceS2
5b20b84546
MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes
2025-06-03 17:35:08 +02:00
Suman Maharana
720c6d3f9f
Add: Looker explore to view Column Lineage (#21504)
* Add: explore to view Column Lineage

* Add tags ingestion and fix cll warnings

* lint

* Addressed comments

* fixed tests
2025-06-03 20:23:43 +05:30
Suman Maharana
c00ed22866
Fix: Tableau Validation Errors (#21530) 2025-06-03 11:03:45 +05:30
Teddy
3c5fbffeaa
feat: add regex support for dbx (#21514) 2025-06-02 17:55:48 +02:00
Teddy
859f24aba7
MINOR: row sampling error (#21488)
* fix: row sampling error

* fix: return sample query
2025-06-02 09:02:17 +02:00
harshsoni2024
841cc5753d
issue-21439: dashboard lineage override (#21440) 2025-06-02 11:36:44 +05:30
Suman Maharana
7e3c732919
Fix: Databricks Schema Description (#21367) 2025-06-02 11:34:07 +05:30
Suman Maharana
1c4500b119
Fix: looker CLL errors (#21493)
* Fix: looker CLL errors

* Addressed comments- added exception handling

* addressed comments

* linting
2025-05-31 17:29:23 +05:30
Pere Menal-Ferrer
6683c632f4
FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-30 08:18:50 +02:00
Suman Maharana
21f3c4be3c
Add: Looker column level lineage (#21458)
* Add: Looker column level lineage

* Fix broken lineage

* add exception handling

---------

Co-authored-by: ulixius9 <mayursingal9@gmail.com>
2025-05-29 17:26:55 +05:30
Teddy
2a120c166a
MINOR: Py failing test cases (#21437)
* fix: failing test cases

* fix: skip test for now
2025-05-28 17:52:32 +02:00
Albert Franzi
859e38057e
Fixes 21327: Update Lightdash connector (#21328)
* fix: [21327] - Update Lightdash connector

* fix: [21327] - Solve style checks

* fix: [21327] - Report the right chart type field

* fix: [21327] - Apply changes
2025-05-28 17:43:15 +05:30
Mayur Singal
85e8776a10
Fix #17799: Doris ingestion failed (#21420) 2025-05-28 16:10:59 +05:30
Pere Menal-Ferrer
8eb0b25c19
fix/nox-ci-missing-steps (#21426)
* Fix nox-ci

* Fix wrong path

* Fix wrong path

* Use working-directory for gha

* Fix wrong section in gha yml

* Diable some lint to diagnose failures

* Rm version matrix for debugging

* Fix type in nox invocation

* Fix style

* Add version and update checkout version

* Add required system dependencies

* WIP

* Add python code generation

* Remove version extraction from nox, as it not needed

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-28 11:12:44 +02:00
Pere Menal-Ferrer
ac9f803b46
Make presidio_analyzer a lazy import in the PII processor (#21408)
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 14:24:28 +02:00
harshsoni2024
8bbc4d8c3d
MINOR: PBI dataset expressions empty value fix (#21409) 2025-05-27 16:50:55 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
gpby
342eaee092
Fixes #20956: Teradata profiler (#21292)
* add teradata functions

* fix teradata schema

* reformat code

* change random approach for teradata

---------

Co-authored-by: Teddy <teddy.crepineau@gmail.com>
2025-05-21 09:12:15 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
Teddy
8caaa9bda0
fix: like test (#21307) 2025-05-20 15:02:37 +02:00
Teddy
2904c94700
fix: clean up import (#21308) 2025-05-20 15:01:59 +02:00
harshsoni2024
176b731337
MINOR: presidio sample data lib fix (#21295) 2025-05-20 17:40:44 +05:30
Mayur Singal
2fd0606cdd
MINOR: Snowflake View Definition Fallback (#21296) 2025-05-20 15:18:34 +05:30
Mayur Singal
509bc0d826
MINOR: Use slow query log for mysql lineage (#21291) 2025-05-20 11:10:06 +05:30
Mayur Singal
35d8c2a25c
Fix #20746: DB columns in Vertica (#21288) 2025-05-20 11:06:29 +05:30
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
harshsoni2024
a414e93163
bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272) 2025-05-19 19:21:02 +05:30
Suman Maharana
eb371bca12
Fix : Tableau e2e extra params (#21237) 2025-05-19 18:35:40 +05:30
Mayur Singal
698956783b
Fix #1532: Fix Error ingesting using Datalake adls connector (#21243) 2025-05-19 12:30:56 +05:30
Pere Miquel Brull
6444ea3750
FIX - Ingestion workaround for Services with null secrets (#21260)
* FIX - Ingestion workaround for Services with null secrets

* linting
2025-05-19 08:53:37 +02:00
Suman Maharana
5a3d40f643
Fix: dbt multi owner support from manifest (#21233) 2025-05-19 12:04:22 +05:30
Mayur Singal
9ec424a3fa
Fix #1550: Metadata ingestion errors from Azure Data Lake (#21261) 2025-05-19 11:44:19 +05:30
Mayur Singal
7efa5e650b
MINOR: Add athena schema comment support (#21262) 2025-05-19 10:31:15 +05:30
Pere Miquel Brull
aa96019ab1
Rel to #1575 - LabelType Generated (#21244)
* Rel to #1575 - LabelType Generated

* migration

* format

* tests

* generate types for taglabel

---------

Co-authored-by: karanh37 <karanh37@gmail.com>
2025-05-19 06:59:13 +02:00
Mayur Singal
2157337847
MINOR: Configurable account usage for incremental metadata extraction (#21182) 2025-05-19 10:15:29 +05:30
Mohit Tilala
4c0ce77756
Fix airbyte pipeline lineage extraction (#21151) 2025-05-19 10:14:33 +05:30
Mayur Singal
703118f2b5
MINOR: Disable Flaky superset tests (#21242) 2025-05-18 23:12:42 +05:30
Teddy
2e8e79ff0a
ISSUE #17170: handle oracle unique count (#21225)
* fix: handle oracle unique count

* fix: failing test case
2025-05-16 17:44:28 +05:30
Pere Menal-Ferrer
a7e2f33adc
feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-16 14:03:49 +02:00
harshsoni2024
9c9e885d77
issue-20074: s3 objects get paginated response (#21208) 2025-05-15 18:20:10 +05:30
harshsoni2024
35c1f5aead
issue-19890: PBI dataflow support (#21207) 2025-05-15 18:17:49 +05:30
Suman Maharana
2864e0f09d
Minor: Add sql query for dbt lineage with nodes (#21214) 2025-05-15 17:49:47 +05:30
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
Teddy
cd6434dd73
ISSUE #21146 - Properly handle connection on sampler (#21186)
* fix: properly close connection on sampler ingestion

* fix: dangling connection test

* style: ran python linting

* fix: revert to 9
2025-05-15 12:21:01 +02:00
IceS2
87463df51d
Fixes #21095: Handle Conn Retry and implement is_disconnect for MSSQL (#21185)
* Handle Conn Retry and implement is_disconnect for MSSQL

* Change log to debug
2025-05-15 12:19:58 +02:00