36 Commits

Author SHA1 Message Date
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Imri Paran
cd74d8f55a
MINOR: ref(data-quality): modularized test case validator import (#18716)
* ref(data-quality): modularized test case validator import

- removed test_suite_factory
- implemented TestCaseImporter
- removed SQAValidatorBuilder and PandasValidatorBuilder in favor of a SourceType enum
- removed the orm table creation from test suite source

* format

* IValidatorBuilder -> ValidatorBuilder

* use the table from the sampler in the test suite interface

* linting

* fixed the profiler with similar solution

* removed unused inheritance

* removed unneeded super().__init__()

* removed all instances of orm_table

* fixed tests

* add reportExplicitAny=false

* fixed tests
2024-11-27 16:25:12 +01:00
Teddy
58699063db
MINOR -- Fix DQ Partition Issue (#18641)
* fix: renamed `random_sample` to `get_dataset` and change dunder method access for SQA Table object

* fix: removed handle_partition decorator

* fix: fixed DQ partition issue + moved to `tablesample` method

* style: ran python linting

* style: fix python format check issues

* feat: added postgres tablesample

* style: ran python linting

* fix: sampling delta

* fix: merge conflicts

* fix: resolved conflicts

* style: ran python linting

* fix: patch orm call in test case

* fix: mock build_table_orm call in tests

* fix: test case failures and errors

* fix: removed unused import

* fix: patch typo

* fix: trino table schema retrieval

* fix: remove tuple context manager for 3.8 test support
2024-11-27 08:50:54 +01:00
Pere Miquel Brull
c68a45e7d8
Create new Auto Classification Workflow (#18610) 2024-11-19 08:10:45 +01:00
Teddy
d579008c99
GEN 1683 - Add Column Value to be At Expected Location Test (#18524)
* feat: added column value to be in expected location test

* fix: renamed value -> values

* doc: added 1.6 documentatio entry

* style: ran python linting

* fix: move data packaging to pyproject.yaml

* fix: add init file back for data package

* fix: failing test case
2024-11-06 11:17:13 +01:00
Imri Paran
b960b60965
Fix #16421: add tableDiff test case (#16554)
* feat: add tableDiff test case

This changed introduces a "table diff" test case which
compares two tables and fails if they are not identical.
The similarity is made based on a specific "key" (because the test only makes sense when performed on ordered collections).

1. Added the `tableDiff` test definition.
2. Implemented a "runtime" parameters feature which injects additional parameters for the test at runtime.
3. Integration tests (because of course).

This feature was not tested end-to-end yet because "array" data

* pydantic v2

* format

* format

* format and added data diff to setup.py

* format

* fixed param issue which has type ARRAY

* fixed runtime_parameter_setter

* moved models to parent directory

* handle errors in table diff

* fixed issue with edit test case

* format

* added more details to pytest skip

* format

* refactor: Improve createTestCaseParameters function in DataQualityUtils

* fixed unit test

* removed unused fixture

* removed validator.py

* fixed tests

* added validate kwarg to tests_mixin

* removed "postgres" data diff extra as they interfere with psycopg2-binary

* fixed tests

* pinned tenacity for tests

* reverted tenacity pinning

* added ui support for test diff

* fixed dq cypress and added edit flow

* organized the test case

* added dialect support

* fixed tests

* option style fix

* fixed calculation for passing/failing rows

* restrict the tableDiff test to limited services

* set where to None if blank string

* fixed where clause

* fixed tests for where clause

* use displayName in place of name in edit form

* added docs for RuntimeParameterSetter

* fixed cypress

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2024-06-20 16:54:12 +02:00
Ayush Shah
a15da7ec98
Issue #14812: Add support for empty string as missing count (#16017) 2024-04-25 09:45:26 +05:30
Teddy
3dc642989c
Fixes #7729 - Add logic to compute passed/failed rows (#14472)
* feat: add test case resolution task workflow

* chore: add migration for test case resolution feature

* fix: removed required field for object compatibiity in older migrations

* fix: minor testCaseResolution status logic

* chore: revert migration for test case incident

* chore: update migration file

* style: renamed variables

* feat: added logic to compute failed/passed rows

* feat: add support for row level computation in schema

* chore: add test definition migration

* feat: add logic to explicitly compute row level failure

* chore: clean up code

* style: fix java

* style: fix pyton format

* fix: unhidde API for incident manager

* style: fix java styling
2023-12-27 13:38:51 +01:00
Pere Miquel Brull
b786064bc2
#11857 - Store workflow status in the Ingestion Pipeline Status (#14462)
* Register StackTraceError in spec

* Register StackTraceError in spec

* Register StackTraceError in spec

* Add todos

* Update status

* docs

* format

* Fix tests

* Fix tests

* Fix tests

* Ignore generated

* Fix tests

* Fix tests

* Tests

* Try constants

* Try constants

* Print

* Print

* Print

* order

* Fix service name

* fix ui error

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2023-12-22 15:43:50 +01:00
Ayush Shah
ebc0a551e5
Fixes 12947: Add Support For DQ and Profiler in Databricks Unity Catalog (#14424) 2023-12-20 21:18:05 +05:30
Teddy
3bbf55fcda
FIXES #14049 - Split test case resolution status from test case result (#14204)
* refactor: entityFQN as ListFilter condition

* feat: implement resolution entity timeseries

* fix: rename to testCaseResolutionStatus

* ref: extracted ES query builder into private method

* ref: extract OS query builder in its own method

* ref: remove ingestion logic for test case resolution

* fix: reorganize json schemas to fix circular import in Python

* ref: object names in typescript code

* feat: added indexing of test case resolution

* feat: added test case resolution sample data

* fix: test case resolution api logic

* fix: audit logger for entityTimeSeriesInterface

* fix: DDL generation

* style: python linting

* fix: skip UI test case resolution tests

* fix: remove extension field

* fix: renamed testCaseFailureStatus to testCaseResolutionStatus

* fix: remove reviewer

* fix: rename sequenceId to stateId

* fix: re adjust search weights

* fix: removed InReview status

* style: ran python linting
2023-12-04 23:18:01 -08:00
Ayush Shah
ab1ec50c2c
Fixes Mssql Ntext, text and Image (#12490) 2023-07-20 13:34:35 +05:30
Teddy
4b9f213dbf
Fixes Issue #11863 - Add Status to DQ (#11893)
* feat: added entityReference field in testSuite to link testSuite to an entity when the testSuite is executable.

* feat: added `executableEntityReference` as an entity reference for executable test suite to their entity

* feat: add status object to test case results

* feat: ran python linting

* feat: fixed  update to
2023-06-06 10:09:16 +00:00
Teddy
721869428e
Revert "Fixe Issue #11863 - Add Status logic for test case results (#11881)" (#11892)
This reverts commit 06735fe8dbaac5b267c9a2cf744ca154f88a9247.
2023-06-06 09:56:12 +02:00
Teddy
06735fe8db
Fixe Issue #11863 - Add Status logic for test case results (#11881)
* feat: added entityReference field in testSuite to link testSuite to an entity when the testSuite is executable.

* feat: added `executableEntityReference` as an entity reference for executable test suite to their entity

* feat: add status object to test case results

* feat: ran python linting
2023-06-06 09:45:49 +02:00
Teddy
d0cffdcd66
Fixes Issue #11438 - Implement threshold and startegy for custom SQL (#11847)
* feat: Add threshold and strategy logic on the custom SQL object test

* feat: ran python linting

* feat: added safety checks for custom sql query

* feat: ran python linting
2023-06-02 09:41:31 +02:00
Teddy
c98a15ca19
Fixes #11705 - Update ingestion and backend to match new DQ flow (#11836)
* feat: refactor ingestion flow logic

* feat: ran python linting

* feat: update tests to match new workflow

* feat: ran python linting

* feat: update sample data test suite name

* feat: Added backend logic to support logical and executable test suites

* feat: clean up java and json code

* feat: added sample data for logical and executable test suites

* feat: remove executable from CreateTestSuite

* feat: ran python and java linting

* feat: added README info for data quality structure

* skipping cypress to keep main green

* fixed typescript type issue

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2023-06-01 23:19:13 -07:00
Pere Miquel Brull
b988f39152
Fix test usage resources (#11014) 2023-04-12 05:46:29 +00:00
Ayush Shah
9d11029ec8
Fixes 10351: Fixes Metrics Computation, Samping, test suites and partioning (#10603)
Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
2023-04-11 20:58:31 +05:30
Teddy
9b4e9132ae
fixed #9656 - Add support for date type to column values to be between (#10890)
* fix: renamed  to  submodule

* fix: linting

* fix: columnValuesToBeBetween test for date column type
2023-04-04 17:16:44 +02:00
Teddy
5208b6f684
Fixes #4368 - Add Histogram Metric (#10422) 2023-03-03 21:56:32 +01:00
Teddy
83be5d933b
Fixes #9301 - Refactor TestSuite and Remove Pandas from Base Requirements (#10244)
* feat(testSuite): extracted out column test for SQA type

* refactor(testSuite): extracted SQA column and table tests into their own classes

* refactor(testSuite): Added pkutil namespace package style for test suite classes

* refactor(testSuite): added dynamic importer function for test cases

* refactor(testSuite): black formatting

* refactor(testSuite): fixed linting issues

* refactor(testSuite): refactor metrics for dataframe

* refactor(testSuite): Added Mixins and base methods

* refactor(testSuite): extrcated out get bound for floats

* refactor(testSuite): Added pandas column test cases

* refactor(testSuite): Deleted old column tests

* refactor(testSuite): Added table tests for datalake

* refactor(testSuite): Removed old tests definition

* refactor(testSuite): changed registry to dynamic class inport

* refactor(testSuite): renamed dl_fn to df_fn

* refactor(testSuite): updated registry unit test

* refactor(testSuite): updated import path to sqa like column

* refactor(testSuite): cleaned up imports in old files

* refactor(testSuite): harmonzied SQALikeColumn object to replicate SQA Column object

* refactor(testSuite): linting

* refactor(testSuite): linting

* refactor(testSuite): raise expection on DQ exception

* refactor(testSuite): linting

* refactor(testSuite): removed pandas from base requirements

* refactor(testSuite): Added __futur__ for py3.7 type hint

* refactor(testSuite): added `df` to good-names

* refactor(testSuite): renamed Handler to Validator

* refactor(testSuite): Added test inheritance for column tests

* refactor(testSuite): cleaned up column type check

* refactor(testSuite): cleaned up typo

* refactor(testSuite): extracted main table test logic into parent class

* refactor(testSuite): linting

* refactor(testSuite): linting fixes

* refactor(testSuite): address doc string and linting issues
2023-02-22 09:42:34 +01:00
Teddy
ba08302ea1
Issue #7291 - Implements Table Rows Inserted to be Between test (#9813)
* staging commit

* staging commit

* refactor: partitioning logic

* refactor (tests): move to parametrized tests for test validations

* refactor: local variables into global

* (feat): Added logic for table row inserted test

* (feat): fix python checkstyle

* feature: extracted get_query_filter logic into its own function
2023-01-31 15:57:51 +01:00
Onkar Ravgan
b539b299ee
Integrated schema parsers (#9305)
* Integrated schema parsers

* Addressed review comments

* fixed pytests
2022-12-15 16:54:55 +05:30
Ayush Shah
a6ae9fd11a
Add Test Suite Implementation for Datalake (#9235) 2022-12-14 21:14:51 +05:30
Teddy
3cad959e44
Fixes #6760 -- Implements REGEX for regex test (#9033)
feat(testCase): impelemented regex logic for test suite
2022-11-29 13:00:28 +01:00
Teddy
989f2911c2
Fixes #7810 - Allow to only pass min or max (#8474)
* ISSUE-7810 Added default values for min and max
For all data validations on columns:-
min_bound is set to float("-inf"), if there is no next value
max_bound is set to float("inf"), if there is no next value

* Fixed PR errors by removing tuple + added tests

Co-authored-by: demi <deepak1212365@gmail.com>
2022-11-01 13:26:51 +01:00
Teddy
f883863b8a
Fixes #7490 - Split Profiler and TestSuite Interface (#8032)
* Clean up test suite workflow and interface

* Fixed tests

* Split profiler and testSuite interfaces

* Cleaned up workflows and runners

* Fixed code formatting

* - remove old code
- remove `table` attribute used for testing and used mock instead

* Fixed execution bugs from refactor

* Fixed static type checking for profiler/api/workflow.py

* Fixed linting

* Added __init__ files
2022-10-11 15:57:25 +02:00
Teddy
15f7c4aa41
Fix param name for median test (#7942)
* Fixed param name for median test

* Fixed unite test for median DQ
2022-10-05 06:32:28 +02:00
Teddy
f2bf5194bb
Fixes #7623 -- Added logic to encode and decode entityLink (#7670)
* Encode entityLink string when processing request

* Added logic to decode column type from entityLink

* mvn code formating

* Extracted unquote step into its own function
2022-09-23 09:42:33 +02:00
Teddy
1ba6e284fe
Fixes #7118 by cleaning up test names (#7494)
* Cleaned up tests names and add registry name tests

* Updated documentation for test types supported by OM
2022-09-16 07:04:56 +02:00
Sriharsha Chintalapani
656b50dd3a
Fix #7469: Refactor OpenMetadata code modules (#7474) 2022-09-14 23:14:02 -07:00
Teddy
9dbcb3911b
Fix minor column data quality test bugs (#7111)
* Fixed test name issue + filtered out partition details for non BQ tables

* Exclude non BQ table from partition processing

* Fixed test + formating
2022-09-01 13:47:00 +02:00
Teddy
ef41382cb1
Fixes #7094 by fixing minior bugs in table tests (#7095) 2022-08-31 21:35:33 +02:00
Teddy
a39c4db8e7
Add partial support for BQ partitioned table (#7066)
* Added support for BQ time based partition (not ingestion)

* Fixed minor errors in test suite workflow
2022-08-30 11:39:15 -07:00
Teddy
ce578e73d4
Fixes #5831 by implenting testSuite workflow logic (#6911)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs

* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample

* moved status to workflow

* Fixed tests

* removed test logic from profiler sink

* Added logic to return sample from workflow sample value

* Added profiler examples

* Updated documentation for profiler

* Fixed code smells

* commited changed to profiler

* initial commit of the revamp workflow

* Fixed python formating

* cleaned up profiler submodule by removing test related files and functions

* Added airflow DAG logic for testSuite workflow

* Fixed code smells + added airflow ingestion tests + fixed comments
2022-08-25 10:01:28 +02:00