* feat(data-quality): use sampling config in data diff
- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes
* - reverted missing md5
- added missing database service type
* - use a custom substr sql function
* fixed nounce
* added failure for mssql with sampling because it requires a larger change in the data-diff library
* fixed unit tests
* updated range for sampling
* feat: added column value to be in expected location test
* fix: renamed value -> values
* doc: added 1.6 documentatio entry
* style: ran python linting
* fix: move data packaging to pyproject.yaml
* fix: add init file back for data package
* fix: failing test case
* Add flake.nix
* Add lockfile for flake
* Update nix environment and document usage
* Add schema for exasol connector
* Add Exasol definitions to databaseService
* Fix error in exasol connector schema
* Add additional connection options/settings to exasol connector
* Add exasol-connector to ui
* Add depdencies for exasol-connector
* Update notes
* Update ingestion code
* Add Basic Documentation for Exasol Connector
* Update flake file
* Add developer notes
* Add python script which can be used as entry point for debugging in ide
* Add config file which can be used for debugging (manual execution)
* Update debug script
* Update developer notes
* Remove old developer notes
* Add .venv to gitignore
* Update dev notes
* Update development notes
* Update ExasolSource
* Establish basic connection to Exasol DB from connector
* Update exasol connector connection settings
* Add service_spec for exasol plugin
* Remove development files
* Remove unused module
* Applied code formatter
* Update exasol dependency constraint(s)
* Add unit test for exasol connection url(s)
* Fixed test expectations for exasol connection url test(s)
* Adjust the test query for the Exasol connection test
* fix snowflake system metrics
* format
* add link to logs and commit
fixed the dq cli test
* reverted bad formatting
* fixed models.py
* removed version pinning for data diff in tests
* tests(datalake): use minio
1. use minio instead of moto for mimicking s3 behavior.
2. removed moto dependency as it is not compatible with aiobotocore (https://github.com/getmoto/moto/issues/7070#issuecomment-1828484982)
* - moved test_datalake_profiler_e2e.py to datalake/test_profiler
- use minio instead of moto
* fixed tests
* fixed tests
* removed default name for minio container
* tests: refactor
refactor tests and consolidate common functionality in integrations.conftest
this enables writing tests more concisely.
demonstrated with postgres and mssql.
will migrate more
* format
* removed helpers
* changed scope of fictures
* changed scope of fixtures
* added profiler test for mssql
* fixed import in data_quality test
* json safe serialization
* format
* set MARS_Connection
* use SerializableTableData instead of TableData
* deleted file test_postgres.py
* fixed tests
* added more test cases
* format
* changed name test_models.py
* removed the logic for serializing table data
* wip
* changed mapping in common type map
* changed mapping in common type map
* reverted TableData imports
* reverted TableData imports
* reverted TableData imports
* Split ExternalSecretsManagerTest to new ExternalSecretsManagerTest and AWSBasedSecretsManagerTest
* implement SecretsManagerFactory to create GCPSecretsManager
* implement GCPSecretsManager
* implements gcp secret manager.
* Fix it for the GCP's rule.
* create a template of GCP
* fix compile error
* implements to use project_id
* add library for the google cloud secret manager
* add test code for using google credential in the docker container
* modify docker-compose.yml for GCP
* add google_crc32c module
* modify ways to get project id
* create a new docker-compose.yml for Google Cloud
* create a new document
* create compose file for gcp secret manager
* fix invalid styles and formats
* downgrade google library to avoid conflicting protoc versions
* feat: add tableDiff test case
This changed introduces a "table diff" test case which
compares two tables and fails if they are not identical.
The similarity is made based on a specific "key" (because the test only makes sense when performed on ordered collections).
1. Added the `tableDiff` test definition.
2. Implemented a "runtime" parameters feature which injects additional parameters for the test at runtime.
3. Integration tests (because of course).
This feature was not tested end-to-end yet because "array" data
* pydantic v2
* format
* format
* format and added data diff to setup.py
* format
* fixed param issue which has type ARRAY
* fixed runtime_parameter_setter
* moved models to parent directory
* handle errors in table diff
* fixed issue with edit test case
* format
* added more details to pytest skip
* format
* refactor: Improve createTestCaseParameters function in DataQualityUtils
* fixed unit test
* removed unused fixture
* removed validator.py
* fixed tests
* added validate kwarg to tests_mixin
* removed "postgres" data diff extra as they interfere with psycopg2-binary
* fixed tests
* pinned tenacity for tests
* reverted tenacity pinning
* added ui support for test diff
* fixed dq cypress and added edit flow
* organized the test case
* added dialect support
* fixed tests
* option style fix
* fixed calculation for passing/failing rows
* restrict the tableDiff test to limited services
* set where to None if blank string
* fixed where clause
* fixed tests for where clause
* use displayName in place of name in edit form
* added docs for RuntimeParameterSetter
* fixed cypress
---------
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
* added trino integration test
* - removed warnings for classes which are not real tests
- removed "helpers" as its being used
* use a docker network instead of host
* print logs for hive failure
* removed superset unit tests
* try pinning requests for test
* try pinning requests for test
* wait for hive to be ready
* fix trino fixture
* - reduced testcontainers_config.max_tries to 5
- remove intermediate containers
* print with logs
* disable capture logging
* updated db host
* removed debug stuff
* removed debug stuff
* removed version pin for requests
* reverted superset
* ignore trino integration on python 3.8
* MINOR - Add Integration Test for S3 Storage
* MINOR - Add Integration Test for S3 Storage
* MINOR - Add Integration Test for S3 Storage
* format
* format