* backend
* format & tests
* rename backend
* migrations and ingestion
* format & tests
* format & tests
* tests
* format & tests
* tests
* updated ui side of changes
* addressing comment
* fixed failing unit test
* fix test list
* added e2e test, and fixed existing test
---------
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
* MINOR: User search should only look in name & displayname
* py_format
* pyformat
---------
Co-authored-by: Suman Maharana <sumanmaharana786@gmail.com>
* fix(data-diff): added nd5 handling for bigquery
- added MD5 handling for bigquery
- use URL instead of Engine because it requires less steps and less prone to failure
* added e2e test for data diff with sampling in bigquery
* fix: sqa table reference
* style: ran python linting
* fix: added raw dataset to query runner
* fix: get table and schema name from orm object
* fix: get table level config for table tests
* add dbt freshness check
* docs
* run linting
* add test case param definition
* fix test case param definition
* add config for dbt http, fix linting
* refactor (only create freshness test definition when user executed one)
* fix dbt files class
* fix dbt files class 2
* fix dbt objects class
* fix linting
* fix pylint
* fix linting once and for all
---------
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
* ref(data-quality): modularized test case validator import
- removed test_suite_factory
- implemented TestCaseImporter
- removed SQAValidatorBuilder and PandasValidatorBuilder in favor of a SourceType enum
- removed the orm table creation from test suite source
* format
* IValidatorBuilder -> ValidatorBuilder
* use the table from the sampler in the test suite interface
* linting
* fixed the profiler with similar solution
* removed unused inheritance
* removed unneeded super().__init__()
* removed all instances of orm_table
* fixed tests
* add reportExplicitAny=false
* fixed tests
* fix(data-diff): sampling configuration
handle the sampling condition separately for the 2 tables allowing to apply sampling on columns with mismatching cases
* format
* fix(redshift-system): redshift return type
* fixed bigquery profiler
* fixed snowflake profiler
* job id action does not support matrix. using plain action summary.
* reverted gha change
* feat(data-quality): use sampling config in data diff
- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes
* - reverted missing md5
- added missing database service type
* - use a custom substr sql function
* fixed nounce
* added failure for mssql with sampling because it requires a larger change in the data-diff library
* fixed unit tests
* updated range for sampling
* feat(statistics-profiler): use statistics tables to profile trino tables
- implemented the collaborative root class
- added the "useStatistics" profiler parameter
- added the "supportsStatistics" database connection property
- implemented the ProfilerWithStatistics and StoredStatisticsSource to add this functionality to specific profilers
- implemented TrinoStoredStatisticsSource for specific trino statistics logic
* added ABC to terminal classes in collaborative root
* fixed docstring for TestSuiteInterface
* reverted unintended changes
* typo