10 Commits

Author SHA1 Message Date
Imri Paran
b960b60965
Fix #16421: add tableDiff test case (#16554)
* feat: add tableDiff test case

This changed introduces a "table diff" test case which
compares two tables and fails if they are not identical.
The similarity is made based on a specific "key" (because the test only makes sense when performed on ordered collections).

1. Added the `tableDiff` test definition.
2. Implemented a "runtime" parameters feature which injects additional parameters for the test at runtime.
3. Integration tests (because of course).

This feature was not tested end-to-end yet because "array" data

* pydantic v2

* format

* format

* format and added data diff to setup.py

* format

* fixed param issue which has type ARRAY

* fixed runtime_parameter_setter

* moved models to parent directory

* handle errors in table diff

* fixed issue with edit test case

* format

* added more details to pytest skip

* format

* refactor: Improve createTestCaseParameters function in DataQualityUtils

* fixed unit test

* removed unused fixture

* removed validator.py

* fixed tests

* added validate kwarg to tests_mixin

* removed "postgres" data diff extra as they interfere with psycopg2-binary

* fixed tests

* pinned tenacity for tests

* reverted tenacity pinning

* added ui support for test diff

* fixed dq cypress and added edit flow

* organized the test case

* added dialect support

* fixed tests

* option style fix

* fixed calculation for passing/failing rows

* restrict the tableDiff test to limited services

* set where to None if blank string

* fixed where clause

* fixed tests for where clause

* use displayName in place of name in edit form

* added docs for RuntimeParameterSetter

* fixed cypress

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2024-06-20 16:54:12 +02:00
Teddy
056e6368d0
Issue #14765 - Preparatory Work (#15312)
* refactor!: change partition metadata structure for table entities

* refactor!: updated json schema for TypeScript code gen

* chore: migration of partition for table entities

* style: python & java linting

* updated ui side change for table partitioned key

* miner fix

* addressing comments

* fixed ci error

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2024-02-28 07:11:00 +01:00
Teddy
61ef55290e
MINOR - generic profiler optimization for sampling and BQ (#14507)
* fix: limit sampling to specific column

* fix: handle bigquery struct columns

* fix: default partition to 1 DAY for BQ

* fix: default to __TABLES__ for BQ table metrics

* style: ran python linting

* style: fix linting

* fix: python style

* fix: set partition to DAY if not HOUR
2023-12-27 19:13:44 +01:00
Teddy
c7ac28f2c2
Fixes #11357 - Implement profiler custom metric processing (#14021)
* feat: add backend support for custom metrics

* feat: fix python test

* feat: support custom metrics computation

* feat: updated tests for custom metrics

* feat: added dl support for min max of datetime

* feat: added is safe query check for query sampler

* feat: added support for custom metric computation in dl

* feat: added explicit addProper for pydantic model import fo Extra

* feat: added custom metric to returned obj

* feat: wrapped trino import in __init__

* feat: fix python linting

* feat: fix typing in 3.8
2023-11-17 17:51:39 +01:00
Teddy
452a33b1a0
Fixes Druid Profiler failures (#13700)
* fix: updated playwrigth test structure

* fix: druid profiler queries

* fix: python linting

* fix: python linting

* fix: do not compute random sample if profile sample is 100

* fix: updated workflow to test on push

* fix: move connector config to category folder

* fix: updated imports

* fix: added pytest-dependency package

* fix: updated readme.md

* fix: python linting

* fix: updated profile doc for Druid sampling

* fix: empty commit for CI

* fix: added workflow constrain back

* fix: sonar code smell

* fix: added secrets to container

* Update openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/ingestion/workflows/profiler/index.md

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>

* Update openmetadata-docs/content/v1.2.x-SNAPSHOT/connectors/ingestion/workflows/profiler/index.md

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>

* Update ingestion/tests/e2e/entity/database/test_redshift.py

* fix: ran pylint

* fix: updated redshift env var.

* fix: import linting

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-10-25 20:47:51 +02:00
Teddy
31d2595e4f
fix: pass rnd table bound columns to sample query (#13561) 2023-10-13 14:57:28 +05:30
Teddy
1cbdfb3ae7
Fixes #12601 - column filter for profiler workflow (#13535)
* fix: sample data ingestion to match entity profiler column setting

* fix: python linting

* fix: updated fn call

* fix: added logic to handle json filed in datalake connector

* fix: handle NA values in parsing

* fix: reverted sampler changes from #13338

* fix: reverted metric changes from #13338

* fix: added datalake profiler ingestion test

* fix: python linting

* fix: removed normalization of json blob in NoSQL db
2023-10-12 14:51:38 +02:00
Ayush Shah
97f4f8fbf3
Fixes 12922: Trino NaN issue + TrinoUserError (#13244)
* Fix Trino NaN issue + TrinoUserError
2023-10-04 18:39:39 +05:30
Teddy
54fbe250a1
fix: import error + BQ E2E CLI (#12420) 2023-07-13 13:35:37 +02:00
Teddy
b89cf64f14
Clean up profiler (#12369)
* ref: implemented interface for profiler components + removed struct logic

* ref: ran python linting

* ref: added UML diagram to readme.md

* ref: empty commit for labeler check

* ref: remove multiple context manager for 3.7 3.8 compatibility

* ref: remove
2023-07-12 17:02:32 +02:00