* Add assets API and deprecate inline assets field for Domain and Dataproduct
* fix mvn test
* fix py test and add new tests
* fix py test
* fix py test
* fix timeout for workflow test
* address pr feedback
* Update generated TypeScript types
* minor- remove unused function
---------
Co-authored-by: Bhanu Agrawal <bhanuagrawal2018@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial plan
* Fix Grafana connector format field validation issue
- Update GrafanaTarget.format field to accept both str and int types
- Add field_validator to convert integer format codes to string equivalents
- Add comprehensive tests for format field validation scenarios
- Add test fixture with integer format fields that reproduces the original issue
- Ensure backwards compatibility with existing string format values
This resolves the issue where Grafana dashboards with integer format fields
(e.g., format: 0 instead of format: "table") were causing validation errors
and being skipped during ingestion.
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
* fix: GrafanaTarget model format type from str to Any
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Keshav Mohta <keshavmohta09@gmail.com>
* Kafka Connect Lineage Improvements
* Remove specific Kafka topic example from docstring
Removed example from the documentation regarding the earnin.bank.dev topic.
* fix: update comment to reflect accurate example for database server name handling
* fix: improve expected FQN display in warning messages for missing Kafka topics
* fix: update table entity retrieval method in KafkaconnectSource
* fix: enhance lineage information checks and improve logging for missing configurations in KafkaconnectSource
* Kafka Connect Lineage Improvements
* address comments; work without the table.include.list
---------
Co-authored-by: Ayush Shah <ayush@getcollate.io>
* chore: implement logger levels tests for depreciation
* fix: use METADATA_LOGGER instead of warnings
* use unit test syntax
* isort
* black
* fix test
---------
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
* Bump datamodel-code-generator to 0.34.0
* Pin down pydantic to <2.12
* Revert "Bump datamodel-code-generator to 0.34.0"
This reverts commit c69116d2935eea49e9c78b2607f2fea94bc44738.
* Update `TableDiffParamsSetter` to move data at table level
This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects
* Update `TableDiffValidator` to use table's `key_columns`
Call `data_diff` and run validations using each table's `key_columns`
* Create migration to update `tableDiff` test definition
* Fix Playwright test
* Add the migration classes and data for recognizers
This is so that we can run a migration that sets `json->recognizers` of `PII.Sensitive` and `PII.NonSensitive` tags from json values.
The issue with normal migrations was that the value of recognizers was too long to be persisted in the server migrations log.
Created a common `migration.utils.v1110.MigrationProcessBase`
* Ensure building automatically with the right parameters
* Update typescript types
* Ensure we take columns ordered from the sampler
This is to avoid analyzing columns with data from other columns
* Remove expectation of address to have Sensitive tag
This is for a couple of reasons:
- First: per our internal definition it should actually be Non Sensitive.
- Second: presidio actually picks SOME of them up as PERSON (Sensitive) entities, but since we've raised the tolerance, now we're not classifying them as Sensitive.
* Refactor presidio utils
Extract the spacy model functionality from the analyzer building function
* Added a new `TagClassifier`
This classifier uses tags to dynamically build presidio `RecognizerRegistry`s
* Added a new `TagProcessor`
This processor uses `TagClassifier` to label a column based on the tags' recognizers
* Create `TagProcessor` based on workflow configuration
* Create decorator to apply threshold to recognizers
This is so that we can apply thresholds on recognizer results without subclassing or having to keep a map between the presidio recognizer and the recognizer configuration
* Fix broken test
* Add `reason` property to `TagLabel`
This is to understand what score was used for selecting the entity
* Build `TagLabel`s with `reason`
* Increase `PIIProcessor._tolerance`
This is so we correctly filter out low scores from classifiers while still maintaining the normalization that filters out confusing outcomes.
e.g: an output with scores 0.3, 0.7 and 0.75, would initially filter the 0.3 and then discard the other two because they're both relatively high results.
* Make database and DAO changes needed to persist `TagLabel.reason`
* Update generated TypeScript types
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add support for translations in multi lang
* Add Tag Feedback System
* Update generated TypeScript types
* Fix typing issues and add tests to reocgnizer factory
* Updated `TagResourceTest.assertFieldChange` to fix broken test
This is because change description values had been serialized into strings and for some reason the keys ended up in a different order. So instead of performing String comparison, we do Json comparisons
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eugenio Doñaque <eugenio.donaque@getcollate.io>