3353 Commits

Author SHA1 Message Date
harshsoni2024
6a6180b2e3
powerbi change owner condition (#21724) 2025-06-12 16:11:43 +05:30
Suman Maharana
18f9f2cdb6
Fix: Tableau project id should always be a string (#21700) 2025-06-12 11:21:53 +05:30
Teddy
a680e2c802
fix: added profiler config when executing bundle suite (#21714) 2025-06-11 17:03:22 +02:00
Teddy
c09a8b27ae
ISSUE #16676 - Add Tag to CreateTestCase (#21366)
* refactor: removed testSuite field from CreateTestCase

BREAKING CHANGE: when creating a test case, testsuite is now derived from entityLink (fetch or created)

* feat: allow setting tags when creating a test case

* style: ran linters

* fix: compiling error

* fix: failing test case

* fix: failing tests

* removed testSuite from required filed

* fixed ui side

* style: ran java linting

* deprecation: remove testSuite param from ingestion

* fix: remove test suite filed

* fix: remove test_suite field

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-06-11 09:59:08 +02:00
Suman Maharana
0df058a53d
Fix: dbtcloud CL errors (#21685) 2025-06-10 21:45:07 +05:30
Mayur Singal
06ae2df2c3
MINOR: Fix bigquery import issue (#21444)
* MINOR: Fix bigquery import issue

* fix checkstyle
2025-06-09 16:08:16 -07:00
Mohit Tilala
8bda216a72
Fixes #21472: Add mention of why snowflake owners are not supported (#21519)
* Add mention of why snowflake owners are not supported

* Remove owners from docs as not supported
2025-06-09 16:05:54 -07:00
Pere Menal-Ferrer
1e86f9870f
FIX #1464 (#21520)
* Add PIICategoryTags and some utilities on top of them.

* Fix static-check

* Add test for fqn representation

* Add NEREntityGeneralTags.json from Collate

* Add test to check PIICategoryTags agree with the ones used by OM server

* Add LabelExtractor

* Fix style

* Add ignore superflous-parens for pylint

* Ass comment as per PR review

* Fix not-updated PII-IT

* Remove duplicated IT test for PII

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-09 16:05:35 -07:00
Mayur Singal
53817c2182
Minor: Make trino query table configurable (#21665) 2025-06-09 16:03:57 -07:00
harshsoni2024
4a3b6f4934
issue-21370: db2 custom driver installation (#21638)
* db2 custom driver installation

* pylint changes

* typo fix
2025-06-09 19:52:35 +05:30
Ayush Shah
05e6a56b41
Add Databricks Sampler, Refactor Unity Catalog Sampler (#21612) 2025-06-09 14:18:35 +05:30
Keshav Mohta
b7a7023890
Fix #20665: BigQuery - Adding billing project (#21231) 2025-06-09 13:09:40 +05:30
Mohit Tilala
9e36cfe012
Remove existing entity source hash presence check (#21621) 2025-06-08 12:37:51 +05:30
Teddy
5078a2fbb9
DEPRECATION: Remove testCaseResults endpoint from testCaseResource (#21527)
* deprecation: remove testCaseResults endpoint from testCaseResource

* fix: path in test e2e test

* fix: endpoint name to testCaseResults

* style: fix java linting
2025-06-07 21:02:54 +02:00
Suman Maharana
161b4a8b2a
Chore: Tableau Improvements (#21620)
* Chore: Tableau Improvements

* Added apiVersion

* linting

* Addressed Comments
2025-06-07 21:38:48 +05:30
Suman Maharana
fd88a6d449
Add: dbt tags Filter (#21628) 2025-06-07 12:25:23 +05:30
Suman Maharana
2c657d6034
Fix: Looker cll parsing issue (#21630)
* Fix: Looker cll parsing issue

* Added checks
2025-06-07 12:21:32 +05:30
Mohit Tilala
ea63db993a
Add lineage dialect for Exasol, Trino and Vertica (#21604) 2025-06-06 11:48:52 +05:30
Elay Gelbart
dec346a84b
Fixes ISSUE 20899: upgrade google-cloud-secret-manager python requirement version (#20900)
* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3

* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3 with ~

* Bump up `mlflow` and `databricks-sdk` for protobuf 5.x.x, pin down google-cloud-secret-manager to 2.22.1 for airflow deps sync

* Pin down databricks-sdk to 0.20.0

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2025-06-06 03:14:25 +05:30
IceS2
8540884ab1
MINOR: Add method to filter ingestion pipeline based on metadata (#21449)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes

* Only dump non-null keys

* Add Method to Filter Ingestion Pipeline based on Metadata

* Reduce the scope to filter only specifically on metadata->workflow->serviceName
2025-06-04 16:13:39 +02:00
Mohit Tilala
44c90557b7
Fix missing __pydantic_fields__ exceptions (#21521) 2025-06-04 16:44:31 +05:30
IceS2
5b20b84546
MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes
2025-06-03 17:35:08 +02:00
Suman Maharana
720c6d3f9f
Add: Looker explore to view Column Lineage (#21504)
* Add: explore to view Column Lineage

* Add tags ingestion and fix cll warnings

* lint

* Addressed comments

* fixed tests
2025-06-03 20:23:43 +05:30
Suman Maharana
c00ed22866
Fix: Tableau Validation Errors (#21530) 2025-06-03 11:03:45 +05:30
Teddy
3c5fbffeaa
feat: add regex support for dbx (#21514) 2025-06-02 17:55:48 +02:00
Teddy
859f24aba7
MINOR: row sampling error (#21488)
* fix: row sampling error

* fix: return sample query
2025-06-02 09:02:17 +02:00
harshsoni2024
841cc5753d
issue-21439: dashboard lineage override (#21440) 2025-06-02 11:36:44 +05:30
Suman Maharana
7e3c732919
Fix: Databricks Schema Description (#21367) 2025-06-02 11:34:07 +05:30
Suman Maharana
1c4500b119
Fix: looker CLL errors (#21493)
* Fix: looker CLL errors

* Addressed comments- added exception handling

* addressed comments

* linting
2025-05-31 17:29:23 +05:30
Pere Menal-Ferrer
6683c632f4
FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-30 08:18:50 +02:00
Suman Maharana
21f3c4be3c
Add: Looker column level lineage (#21458)
* Add: Looker column level lineage

* Fix broken lineage

* add exception handling

---------

Co-authored-by: ulixius9 <mayursingal9@gmail.com>
2025-05-29 17:26:55 +05:30
Teddy
2a120c166a
MINOR: Py failing test cases (#21437)
* fix: failing test cases

* fix: skip test for now
2025-05-28 17:52:32 +02:00
Albert Franzi
859e38057e
Fixes 21327: Update Lightdash connector (#21328)
* fix: [21327] - Update Lightdash connector

* fix: [21327] - Solve style checks

* fix: [21327] - Report the right chart type field

* fix: [21327] - Apply changes
2025-05-28 17:43:15 +05:30
Mayur Singal
85e8776a10
Fix #17799: Doris ingestion failed (#21420) 2025-05-28 16:10:59 +05:30
Pere Menal-Ferrer
8eb0b25c19
fix/nox-ci-missing-steps (#21426)
* Fix nox-ci

* Fix wrong path

* Fix wrong path

* Use working-directory for gha

* Fix wrong section in gha yml

* Diable some lint to diagnose failures

* Rm version matrix for debugging

* Fix type in nox invocation

* Fix style

* Add version and update checkout version

* Add required system dependencies

* WIP

* Add python code generation

* Remove version extraction from nox, as it not needed

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-28 11:12:44 +02:00
Pere Menal-Ferrer
ac9f803b46
Make presidio_analyzer a lazy import in the PII processor (#21408)
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 14:24:28 +02:00
harshsoni2024
8bbc4d8c3d
MINOR: PBI dataset expressions empty value fix (#21409) 2025-05-27 16:50:55 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
gpby
342eaee092
Fixes #20956: Teradata profiler (#21292)
* add teradata functions

* fix teradata schema

* reformat code

* change random approach for teradata

---------

Co-authored-by: Teddy <teddy.crepineau@gmail.com>
2025-05-21 09:12:15 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
Teddy
8caaa9bda0
fix: like test (#21307) 2025-05-20 15:02:37 +02:00
Teddy
2904c94700
fix: clean up import (#21308) 2025-05-20 15:01:59 +02:00
harshsoni2024
176b731337
MINOR: presidio sample data lib fix (#21295) 2025-05-20 17:40:44 +05:30
Mayur Singal
2fd0606cdd
MINOR: Snowflake View Definition Fallback (#21296) 2025-05-20 15:18:34 +05:30
Mayur Singal
509bc0d826
MINOR: Use slow query log for mysql lineage (#21291) 2025-05-20 11:10:06 +05:30
Mayur Singal
35d8c2a25c
Fix #20746: DB columns in Vertica (#21288) 2025-05-20 11:06:29 +05:30
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
harshsoni2024
a414e93163
bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272) 2025-05-19 19:21:02 +05:30