* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* feat: add owner assignment support at metadata ingestion level
* docs: Translate comments to English in test_owner
* refactor: move the test_owner-related files into correct positions
* feat: Add support for more source types
* Revert "feat: Add support for more source types"
This reverts commit a7649dcb3204cf98b7f4f9be02fbb982d2532193.
* feat: Add owner field support in sourceConfig for Database and Dashboard ingestion (fixes#22392)
* refactor code with the required style
* add owner field in related json file
* feat: add topology-based owner config for database/schema/table
* Format the code by the pre-commit tools
* fix some errors
* add a doc to explain this feature
* translate all Chinese comments to English and consolidate documentation
* remove redundant code
* refactor code
* refactor code
* refactor code
* refactor code
* Add some tests for owner-config and enhance this feat
* Add some tests for owner-config and enhance this feat
* fix some error
* fix some error
* refactor code
* Remove the yaml and bash test files and test owner config with pytest style
* format the python code
* refactor ingestion code
* refactor code
* fix some error in test_owner_utils
---------
Co-authored-by: Ma,Yutao <yutao.ma@sap.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Extend `metadata.sdk.configure` function
* Create convenience classes for existing `TestDefinition`s
* Create `WorkflowConfigBuilder` for data quality
* Create `ResultCapturingProcessor` for data quality
This is so we can intercept results from `TestCaseRunner` and return results to the calling application
* Implement `TestRunner` interface to run test cases as code
* Add an example of the simplified API
Also, fix some static checks errors in `builder_end_to_end.py`
* Add assets API and deprecate inline assets field for Domain and Dataproduct
* fix mvn test
* fix py test and add new tests
* fix py test
* fix py test
* fix timeout for workflow test
* address pr feedback
* Update generated TypeScript types
* minor- remove unused function
---------
Co-authored-by: Bhanu Agrawal <bhanuagrawal2018@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial plan
* Fix Grafana connector format field validation issue
- Update GrafanaTarget.format field to accept both str and int types
- Add field_validator to convert integer format codes to string equivalents
- Add comprehensive tests for format field validation scenarios
- Add test fixture with integer format fields that reproduces the original issue
- Ensure backwards compatibility with existing string format values
This resolves the issue where Grafana dashboards with integer format fields
(e.g., format: 0 instead of format: "table") were causing validation errors
and being skipped during ingestion.
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
* fix: GrafanaTarget model format type from str to Any
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Keshav Mohta <keshavmohta09@gmail.com>
* Kafka Connect Lineage Improvements
* Remove specific Kafka topic example from docstring
Removed example from the documentation regarding the earnin.bank.dev topic.
* fix: update comment to reflect accurate example for database server name handling
* fix: improve expected FQN display in warning messages for missing Kafka topics
* fix: update table entity retrieval method in KafkaconnectSource
* fix: enhance lineage information checks and improve logging for missing configurations in KafkaconnectSource
* Kafka Connect Lineage Improvements
* address comments; work without the table.include.list
---------
Co-authored-by: Ayush Shah <ayush@getcollate.io>
* chore: implement logger levels tests for depreciation
* fix: use METADATA_LOGGER instead of warnings
* use unit test syntax
* isort
* black
* fix test
---------
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
* Bump datamodel-code-generator to 0.34.0
* Pin down pydantic to <2.12
* Revert "Bump datamodel-code-generator to 0.34.0"
This reverts commit c69116d2935eea49e9c78b2607f2fea94bc44738.
* Update `TableDiffParamsSetter` to move data at table level
This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects
* Update `TableDiffValidator` to use table's `key_columns`
Call `data_diff` and run validations using each table's `key_columns`
* Create migration to update `tableDiff` test definition
* Fix Playwright test
* Add the migration classes and data for recognizers
This is so that we can run a migration that sets `json->recognizers` of `PII.Sensitive` and `PII.NonSensitive` tags from json values.
The issue with normal migrations was that the value of recognizers was too long to be persisted in the server migrations log.
Created a common `migration.utils.v1110.MigrationProcessBase`
* Ensure building automatically with the right parameters
* Update typescript types
* Ensure we take columns ordered from the sampler
This is to avoid analyzing columns with data from other columns
* Remove expectation of address to have Sensitive tag
This is for a couple of reasons:
- First: per our internal definition it should actually be Non Sensitive.
- Second: presidio actually picks SOME of them up as PERSON (Sensitive) entities, but since we've raised the tolerance, now we're not classifying them as Sensitive.