* chore: implement logger levels tests for depreciation
* fix: use METADATA_LOGGER instead of warnings
* use unit test syntax
* isort
* black
* fix test
---------
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
* Bump datamodel-code-generator to 0.34.0
* Pin down pydantic to <2.12
* Revert "Bump datamodel-code-generator to 0.34.0"
This reverts commit c69116d2935eea49e9c78b2607f2fea94bc44738.
* Update `TableDiffParamsSetter` to move data at table level
This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects
* Update `TableDiffValidator` to use table's `key_columns`
Call `data_diff` and run validations using each table's `key_columns`
* Create migration to update `tableDiff` test definition
* Fix Playwright test
* Add the migration classes and data for recognizers
This is so that we can run a migration that sets `json->recognizers` of `PII.Sensitive` and `PII.NonSensitive` tags from json values.
The issue with normal migrations was that the value of recognizers was too long to be persisted in the server migrations log.
Created a common `migration.utils.v1110.MigrationProcessBase`
* Ensure building automatically with the right parameters
* Update typescript types
* Ensure we take columns ordered from the sampler
This is to avoid analyzing columns with data from other columns
* Remove expectation of address to have Sensitive tag
This is for a couple of reasons:
- First: per our internal definition it should actually be Non Sensitive.
- Second: presidio actually picks SOME of them up as PERSON (Sensitive) entities, but since we've raised the tolerance, now we're not classifying them as Sensitive.
* Refactor presidio utils
Extract the spacy model functionality from the analyzer building function
* Added a new `TagClassifier`
This classifier uses tags to dynamically build presidio `RecognizerRegistry`s
* Added a new `TagProcessor`
This processor uses `TagClassifier` to label a column based on the tags' recognizers
* Create `TagProcessor` based on workflow configuration
* Create decorator to apply threshold to recognizers
This is so that we can apply thresholds on recognizer results without subclassing or having to keep a map between the presidio recognizer and the recognizer configuration
* Fix broken test
* Add `reason` property to `TagLabel`
This is to understand what score was used for selecting the entity
* Build `TagLabel`s with `reason`
* Increase `PIIProcessor._tolerance`
This is so we correctly filter out low scores from classifiers while still maintaining the normalization that filters out confusing outcomes.
e.g: an output with scores 0.3, 0.7 and 0.75, would initially filter the 0.3 and then discard the other two because they're both relatively high results.
* Make database and DAO changes needed to persist `TagLabel.reason`
* Update generated TypeScript types
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add support for translations in multi lang
* Add Tag Feedback System
* Update generated TypeScript types
* Fix typing issues and add tests to reocgnizer factory
* Updated `TagResourceTest.assertFieldChange` to fix broken test
This is because change description values had been serialized into strings and for some reason the keys ended up in a different order. So instead of performing String comparison, we do Json comparisons
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eugenio Doñaque <eugenio.donaque@getcollate.io>
* Update test data for `tests.integration.trino`
This is to create tables with complex data types.
Using raw SQL because creating tables with pandas didn't get the right types for the structs
* Update tests to reproduce the issue
Also included the new tables in the other tests to make sure complex data types do not break anything else
Reference: [issue 16983](https://github.com/open-metadata/OpenMetadata/issues/16983)
* Added `TypeDecorator`s handle `trino.types.NamedRowTuple`
This is because pydantic couldn't figure out how to create python objects when receiving `NamedRowTuple`s, which broke the sampling process.
This makes sure the data we receive from the trino interface is compatible with Pydantic
* feat: databricks oauth and azure ad auth setup
* refactor: add auth type changes in databricks.md
* fix: test after oauth changes
* refactor: unity catalog connection to databricks connection code