* Update `TableDiffParamsSetter` to move data at table level
This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects
* Update `TableDiffValidator` to use table's `key_columns`
Call `data_diff` and run validations using each table's `key_columns`
* Create migration to update `tableDiff` test definition
* Fix Playwright test
* Fix Oracle DataDiff and Change Oracle Connection to BaseConnection
* Add small unittest
* Fix Test
* Fix logic, to void other engines to denormalize table/schema names
* fix(data-diff): added nd5 handling for bigquery
- added MD5 handling for bigquery
- use URL instead of Engine because it requires less steps and less prone to failure
* added e2e test for data diff with sampling in bigquery
* feat(data-quality): support multiple runtime parameter types
- changed the runtime parameters setter factory to return sets
- add the runtime parameters based on the name of the runtime of the runtime parameter
**NOTE** requires changes on collate side
* empty set for default case
* fix(data-diff): sampling configuration
handle the sampling condition separately for the 2 tables allowing to apply sampling on columns with mismatching cases
* format
* feat(data-quality): use sampling config in data diff
- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes
* - reverted missing md5
- added missing database service type
* - use a custom substr sql function
* fixed nounce
* added failure for mssql with sampling because it requires a larger change in the data-diff library
* fixed unit tests
* updated range for sampling
* fix(data-quality): table diff
- added handling for case-insensitive columns
- added handling for different numeric types (int/float/Decimal)
- added handling of boolean test case parameters
* add migrations for table diff
* add migrations for table diff
* removed cross type diff for now. it appears to be flaky
* fixed migrations
* use casefold() instead of lower()
* - implemented utils.get_test_case_param_value
- fixed params for case sensitive column
* handle bool test case parameters
* format
* testing
* format
* list -> List
* list -> List
* - change caseSensitiveColumns default to fase
- added migration to stay backward compatible
* - removed migration files
- updated logging message for table diff migration
* changed bool test case parameters default to always be false
* format
* docs: data diff
- added the caseSensitiveColumns parameter
requires: https://github.com/open-metadata/OpenMetadata/pull/18115
* fixed test_get_bool_test_case_param
* fix: table diff
implemented a safe iterator to handle the sneaky `KeyError`
* changed method to safe_table_diff_iterator
* format
---------
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
* feat(table-diff): added column validation
added column validation for table diff that will be carried out before running the row level diff. If a diff for the column exists, it will short circuit the test and report.
* fixed unit tests
* format
* - resolve column types more robustly
- changed test result metric to include "rows" or "columns"
* feat: add tableDiff test case
This changed introduces a "table diff" test case which
compares two tables and fails if they are not identical.
The similarity is made based on a specific "key" (because the test only makes sense when performed on ordered collections).
1. Added the `tableDiff` test definition.
2. Implemented a "runtime" parameters feature which injects additional parameters for the test at runtime.
3. Integration tests (because of course).
This feature was not tested end-to-end yet because "array" data
* pydantic v2
* format
* format
* format and added data diff to setup.py
* format
* fixed param issue which has type ARRAY
* fixed runtime_parameter_setter
* moved models to parent directory
* handle errors in table diff
* fixed issue with edit test case
* format
* added more details to pytest skip
* format
* refactor: Improve createTestCaseParameters function in DataQualityUtils
* fixed unit test
* removed unused fixture
* removed validator.py
* fixed tests
* added validate kwarg to tests_mixin
* removed "postgres" data diff extra as they interfere with psycopg2-binary
* fixed tests
* pinned tenacity for tests
* reverted tenacity pinning
* added ui support for test diff
* fixed dq cypress and added edit flow
* organized the test case
* added dialect support
* fixed tests
* option style fix
* fixed calculation for passing/failing rows
* restrict the tableDiff test to limited services
* set where to None if blank string
* fixed where clause
* fixed tests for where clause
* use displayName in place of name in edit form
* added docs for RuntimeParameterSetter
* fixed cypress
---------
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>