439 Commits

Author SHA1 Message Date
Teddy
1ef191a2aa
ISSUE #1534 - Profiler Refactor for Metadata Extraction Application (#23200)
* feat: added exporter app config

* refactor: added entityprofile resource & added backward compatibility to existing API

* feat: added tests to get_profile_data_by_type

* feat: remove non supported event types

* chore: added migrations to 1.9.7

* chore: added application creation readme

* chore: move migrations to 1.9.8

* fix: failing java test

* style: ran java linting
2025-09-05 13:07:04 +02:00
Ram Narayan Balaji
5cb33ce78a
Implementation of Adding Entity Status and Reviewers to assets (#22904)
* Initial Implementation of Adding Status and Reviewers to assets for workflows

* Update generated TypeScript types

* Copilot Review Comments Addressed

* Removed DataProduct Reviewer Inheritance as it is irrelevant

* Commit: Classification has status and reviewers, DataContract uses the same status enums, changed the logic to be APPROVED instead of Active, DataContract can have null status as seen in tests, Changed Workflow to use workflowStatus instead of status as it is contradicting with the approval status, Fixed Tests

* Default for reviewers is null

* Default for reviewers is createSchema

* Addressed CoPilots comments

* Update generated TypeScript types

* Workflow status to workflowStatus in db and migrations

* Revert "Workflow status to workflowStatus in db and migrations"

This reverts commit 676e8789358654bc6f980f855c372f33c22fc40b.

* Changed status to entityStatus in the schema files

* Java Implementation of Default Status, Search Client improvements and Test fixes and new tests

* Adding entityStatus and reviewers in the searchIndex mappings and common attributes

* Data Migration scripts to change the glossaryTerm and dataContract structure

* Update generated TypeScript types

* Fixed zh/spreadsheet index json error

* Fix Postgre migration script

* Changed the entityStatus.json to status.json
Removed the duplicates of entityStatus in the indexMapping
Modified the sample data to take in EntityStatus.Approved instead of ContractStatus.Active

* Update generated TypeScript types

* dummy commit

* Fix UI Build Issues with the New EntityStatus
Fix py tests

* Migrations for all the entities that need entityStatus

* Update generated TypeScript types

* Removed Post Migration scripts

* Fix UI  and py for entityStatus

* Update generated TypeScript types

* Fix: DataContractResourceTest

* Fix UI and py for importing entityStatus

* UI to show and fetch Reviewers

* cleanup

* Removed Overridden SetDefaultStatus in GlossaryTermRepository

* Removed unnecessary validation

* Added entityStatus in search_entity_index_mapping.json

* Fixed DataContractResourceTest

* mvn spotless apply and fix migration scripts

* fix tests

* fix type error

* fix advanced search tests

* Status comparison using enums and supportsStatus to supportsEntityStatus

* mvn spotless apply

* fix merge conflict

* update entity status

* fix tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: karanh37 <karanh37@gmail.com>
2025-09-03 12:49:45 +05:30
Pere Miquel Brull
abcdc4e3d6
MINOR - Domain Independent DP Rule (#23067)
* MINOR - Domain Independent DP Rule

* handle DP

* Handle DP

* add migration

* improve rule mgmt

* improve rule mgmt

* add test for bulk op

* fix test

* handle in bulk

---------

Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
2025-08-29 17:28:29 +02:00
Pere Miquel Brull
dfe3fd6357
MINOR - Data Contract Validation (#22541) 2025-07-30 23:01:27 +02:00
Ayush Shah
1e8e38f2ca
MINOR: Custom properties Data types fix (#22342) 2025-07-25 18:39:53 +05:30
Mayur Singal
a94c1bef47
MINOR: Fix mysql pytest version (#22535) 2025-07-24 09:45:37 +05:30
Chirag Madlani
b098395602
Data contracts support for tables & Multi Domain Migration (#22108)
* WIP - MINOR - Rule Engine

* WIP - MINOR - Rule Engine

* WIP - MINOR - Rule Engine

* WIP - MINOR - Rule Engine

* rules

* rules

* rules

* fix retrieval by entity

* test dc

* test dc

* WIP: Data contract feature

* destructure component to it's own files

* WIP contract tab

* update local

* fix test

* First iteration for multi domain support

* fix inheritance fields

* fix inheritance fields

* fix create interface

* fix few more tests

* fix indexing updates

* fix domain rel

* update domain --> domains

* merge

* fix merge

* fix csv tests and createEntity interface

* Update generated TypeScript types

* Trigger Build

* migrations

* fix tests

* fix tests

* fix tests

* Update generated TypeScript types

* Trigger Build

* handle drive service

* fix pg migration

* fix domains ref after merge and clean python tests

* Update generated TypeScript types

* fix merge domains

* format

* add missing migrations

* Update generated TypeScript types

* tests

* Update generated TypeScript types

* Trigger Build

* tests

* tests

* fix py test

* migrate domain to domains and fix compilation errors

* fix domain assignement

* fix domain spec

* fix py tests

* fix data product creation issue

* fix domain tests

* fix bulk import

* fix tests

* fix tests

* fix query and domain migration

* fix py test

* fix playwrights

* fix getEntitiesWithDisplayName indexing quotes

* fix domain prapogation tests

* fix domain propagation

* Fix patch api

* fix domain schema build edit playwright

* fix test

* fix test

* fix domain selection issue and console errors

* quick fix landing page changes

* fix remaining tests

* fix ui tests

* Fix adding data products

* format

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-07-22 09:34:50 +02:00
Ayush Shah
fe2caf7a5d
MINOR: Enhance patch request handling by adding 'skip_on_failure' parameter (#22142)
* Enhance patch request handling by adding 'skip_on_failure' parameter

* Introduced 'skip_on_failure' option in build_patch and OMetaPatchMixin methods to control behavior on patch operation failures.
* Updated documentation to reflect the new parameter and its default value.
* Improved error handling to log warnings instead of raising exceptions when 'skip_on_failure' is set to True.

* fix: add tests for patch request with skip on failure

* refactor: streamline mock patching and improve test readability in patch request tests

* Consolidated import statements for unittest mock.
* Enhanced readability by reducing line breaks and simplifying mock patching syntax.
* Ensured consistent use of commas in function calls for clarity.
* Updated tests to maintain functionality while improving code style.

* fix: improve error handling in patch operations

* Enhanced logging for patch operation failures in both build_patch and OMetaPatchMixin methods.
* Added detailed entity information in warning and error messages to aid in debugging.
* Ensured consistent behavior when 'skip_on_failure' is set, providing clearer feedback on operation outcomes.

* fix: clean up whitespace in patch request error handling

* Removed unnecessary whitespace in the build_patch function to improve code readability.
* Ensured consistent formatting in warning and error messages for better clarity during logging.

* fix: enhance error handling and improve test assertions in patch request

* Updated the condition for checking 'changeDescription' in the _remove_change_description function for better clarity.
* Modified exception handling in tests to raise RuntimeError instead of a generic Exception, providing more specific error feedback.
* Improved assertions in tests to check for the presence of error messages, enhancing the robustness of error handling verification.
* Adjusted test cases to reflect changes in expected patch operation counts and ensure accurate validation of patch operations.

* fix: enhance patch operation with skip_on_failure handling

* Added 'skip_on_failure' parameter to OMetaPatchMixin methods to control behavior on patch failures.
* Improved error handling to log warnings and provide detailed feedback when patch operations are skipped.
* Updated tests to verify the new behavior of skipping failures and improved assertions for clarity.
2025-07-14 12:33:17 +05:30
Ethan
99486a5006
Fixes #18151 : change replaced copy by model_copy (#18153)
* feat: replace copy

* fix native python copy

---------

Co-authored-by: IceS2 <pjt1991@gmail.com>
2025-07-08 16:08:20 +02:00
Mayur Singal
c2a3027962
MINOR: Fix pytest 3.10 (#22192) 2025-07-08 10:09:00 +05:30
Ayush Shah
11ac56356b
MINOR: Modify Sample data (#21599) 2025-06-24 17:16:13 +05:30
Mayur Singal
34c43eaea0
MINOR: Fix pytests (#21807) 2025-06-17 23:44:29 +05:30
harshsoni2024
0f79d8ea1d
MINOR: pytest opt out flaky test (#21800)
* remove mlflow test until fixed

* alationsink test count fixed

* pylint fix gx
2025-06-17 14:23:28 +05:30
Pere Menal-Ferrer
44e09e41a2
Revert "FIX #1464 (#21520)" (#21726)
This reverts commit 1e86f9870fd663122b9bbb64f3cf17cf32619c7f.
2025-06-13 17:27:32 +02:00
IceS2
891ff4184d
MINOR: Initial implementation for our Connection Class (#21581)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Fix Test

* Fix Profile Test Connection

* Remove unit test

* Remove comment

* Fix tests and missing changes
2025-06-13 14:52:29 +02:00
Teddy
c09a8b27ae
ISSUE #16676 - Add Tag to CreateTestCase (#21366)
* refactor: removed testSuite field from CreateTestCase

BREAKING CHANGE: when creating a test case, testsuite is now derived from entityLink (fetch or created)

* feat: allow setting tags when creating a test case

* style: ran linters

* fix: compiling error

* fix: failing test case

* fix: failing tests

* removed testSuite from required filed

* fixed ui side

* style: ran java linting

* deprecation: remove testSuite param from ingestion

* fix: remove test suite filed

* fix: remove test_suite field

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-06-11 09:59:08 +02:00
Pere Menal-Ferrer
1e86f9870f
FIX #1464 (#21520)
* Add PIICategoryTags and some utilities on top of them.

* Fix static-check

* Add test for fqn representation

* Add NEREntityGeneralTags.json from Collate

* Add test to check PIICategoryTags agree with the ones used by OM server

* Add LabelExtractor

* Fix style

* Add ignore superflous-parens for pylint

* Ass comment as per PR review

* Fix not-updated PII-IT

* Remove duplicated IT test for PII

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-09 16:05:35 -07:00
Teddy
5078a2fbb9
DEPRECATION: Remove testCaseResults endpoint from testCaseResource (#21527)
* deprecation: remove testCaseResults endpoint from testCaseResource

* fix: path in test e2e test

* fix: endpoint name to testCaseResults

* style: fix java linting
2025-06-07 21:02:54 +02:00
Teddy
2a120c166a
MINOR: Py failing test cases (#21437)
* fix: failing test cases

* fix: skip test for now
2025-05-28 17:52:32 +02:00
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
Teddy
cd6434dd73
ISSUE #21146 - Properly handle connection on sampler (#21186)
* fix: properly close connection on sampler ingestion

* fix: dangling connection test

* style: ran python linting

* fix: revert to 9
2025-05-15 12:21:01 +02:00
Teddy
209793f315
MINOR - Add support for GX 1.4 (#20934)
* fix: add support for GX 0.18.22 and GX 1.4.x

* fix: add  support for GX 0.18.22 and GX 1.4.x

* style: ran python linting

* fix: skip test if GX version is not installed
2025-04-24 11:55:04 +02:00
Mayur Singal
40ab1814c0
MINOR: Always Include DDL for Views (#20784) 2025-04-15 12:59:50 +05:30
Pere Miquel Brull
c38209c63b
FIX CL-#1427 - PATCH applies inherited owners (#20759)
* FIX CL-#1427 - PATCH applies inherited owners

* FIX CL-#1427 - PATCH applies inherited owners

* format
2025-04-13 06:56:33 +02:00
Mayur Singal
4a407f6d0d
MINOR: Implement column validation in lineage patch api (#20545) 2025-04-07 21:24:46 +05:30
Pere Miquel Brull
3186937cc2
MINOR - Update Auto Classification defaults for sample data & classif… (#20587)
* MINOR - Update Auto Classification defaults for sample data & classification

* fix tests
2025-04-07 15:56:57 +02:00
Mayur Singal
ee5d8eee8b
Revert "MINOR: Implement Column Validation in Lineage (#20544)" (#20658) 2025-04-07 17:13:35 +05:30
Imri Paran
f6441ad404
fix: trino data diff paths (#20457)
requires https://github.com/open-metadata/collate-data-diff/pull/6
2025-04-03 15:48:10 +02:00
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Mayur Singal
7991715135
MINOR: Implement Column Validation in Lineage (#20544) 2025-04-02 17:40:40 +05:30
Imri Paran
663839bd85
test: assert dangling db connections (#20458)
added dangling connection assertions for mysql integration test
2025-04-02 08:38:17 +02:00
Pere Miquel Brull
c08273b4ad
MINOR: Allow loading ometa from env (#20511) 2025-03-31 12:06:33 +02:00
Mayur Singal
e6b7b89f86
Fix #20236: Handle Sample Data with non-utf8 characters (#20380) 2025-03-27 14:20:26 +05:30
Ayush Shah
7a3990f350
Fixes 19119: Enhance TableCustomSQLQueryValidator to support threshold operation (#20307) 2025-03-27 13:11:56 +05:30
Mayur Singal
fb3ba391ff
MINOR: Fix failing pytest (#20332) 2025-03-19 12:35:37 +05:30
fuzmish
7fa3e53403
Fix: Pass raw value of extraHeaders to ClientConfig (#19989) 2025-03-18 13:55:51 +05:30
Pere Miquel Brull
55d7e50441
MINOR - Add and remove data products Actions in Automator (#19948)
* MINOR - Add and remove Data Product assets in Automator config

* MINOR - Add and remove Data Product assets in Automator config

* domain mixin

* build ref

* build ref

* create types

* fix tests

* fix conflicts

---------

Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: karanh37 <karanh37@gmail.com>
2025-03-05 07:11:17 +01:00
Sriharsha Chintalapani
799e49e391
Search: improve relevancy for plural/singular words, partial matches,… (#20000)
* Search: improve relevancy for plural/singular words, partial matches, exact matches

* apply to all indexes

* Fix other query patterns

* Revert changes of database and databaseSchema fields in TableIndex.getFields() and table index mapping

* add missing boost query builder in es

* fix ci

* add max_ngram_diff setting in di-assets index

* fix TestCaseResourceTest mvn test failure

---------

Co-authored-by: sonikashah <sonikashah94@gmail.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2025-02-27 16:47:08 +01:00
Imri Paran
97fad806a2
Fixes 19755: Publish app config with status (#19754)
* feat(app): add config to status

add config to the reported status of the ingestion pipeline

* added separate pipeline service client call for external apps

* fix masking of pydantic model

* - overload model_dump to mask secrets instead of a separate method
- moved tests to test_custom_pydantic.py

* fix: execution time

* fix: mask secrets in dump json

* fix: for python3.8

* fix: for python3.8

* fix: use mask_secrets=False when dumping a model for create

* format

* fix: update mask_secrets=False for workflow configurations

* fix: use context directly when using model_dump_json

* fix: default behavior when dumping json

* format

* fixed tests
2025-02-25 16:51:49 +00:00
Sriharsha Chintalapani
a924064c09
Fix #17723: Generate Incremental Change Events even when consolidation of events applied (#19550)
* Fix #17723: Generate Incremental Change Events even when consolidation of events applied

* Fix #17723: Generate Incremental Change Events even when consolidation of events applied

* fix tests

* Fix tests

* clean policy tests

* update search methods to use incrementalChangeDescription part-1

* Fix the version page playwrights

* update search methods to use incrementalChangeDescription part-2

* introduce new field incrementalChangeDescription for search part-3

* fix mvn endpoint test

* fix followers and page search test

* fix following of assets

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: sonikashah <sonikashah94@gmail.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
2025-02-20 10:23:08 +05:30
Pere Miquel Brull
91b62fdc32
FIX #19798 - Shortening SQA __tablename__ to avoid hitting errors in … (#19809)
* FIX #19798 - Shortening SQA __tablename__ to avoid hitting errors in postgres

* fix tests

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-02-17 09:37:06 +01:00
sonika-shah
c0eb7d08de
GEN -19588 Sort Enum type Custom Property Values (#19637)
* GEN -19588 Sort Enum type Custom Property Values

* fix py-tests

* use streams for sorting
2025-02-11 14:29:01 +05:30
Teddy
28bd01c471
MINOR: Remove default 100 when profileSample is None (#19672)
* fix: remove default 100% percent

* fix: use get_dataset

* fix: orm_profiler tests
2025-02-05 19:14:31 +01:00
Ethan
48700ae9ea
Fixes #18075: Dockerfile lint warning (#18077)
* fix docker warning

* for running actions

---------

Co-authored-by: Akash Jain <15995028+akash-jain-10@users.noreply.github.com>
2025-02-04 15:28:36 +05:30
Teddy
ef131d7e20
MINOR: Wrong attribute name in SampleConfig model (#19641)
* fix: wrong attribute name in SampleConfig model

* fix: test attribute

* fix: failing tests

* fix: trino filter error + adjust test to take into account null value

* fix: mssql and azuresql tablesample on views
2025-02-04 10:40:40 +01:00
Imri Paran
41b1ec081d
tests(e2e): increase CI for sampling test (#19519)
based on experiment in https://gist.github.com/sushi30/3083e96c9081371fa55e55b0847b96d2
2025-01-27 09:31:43 +00:00
Akash Verma
9ecc8a8afe
Added integration testcontainer test for mongodb (#19282) 2025-01-10 10:10:11 +05:30