IceS2 dddec06143
Feature/dimensionality column values stddev to be between (#24235)
* Initial implementation for Dimensionality on Data Quality Tests

* Fix ColumnValuesToBeUnique and create TestCaseResult API

* Refactor dimension result

* Initial E2E Implementation without Impact Score

* Dimensionality Thin Slice

* Update generated TypeScript types

* Update generated TypeScript types

* Removed useless method to use the one we already had

* Fix Pandas Dimensionality checks

* Remove useless comments

* Implement PR comments, fix Tests

* Improve the code a bit

* Fix imports

* Implement Dimensionality for ColumnMeanToBeBetween

* Removed useless comments and improved minor things

* Implement UnitTests

* Fixes

* Moved import pandas to type checking

* Fix Min/Max being optional

* Fix Unittests

* small fixes

* Fix Unittests

* Fix Issue with counting total rows on mean

* Improve code

* Fix Merge

* Removed unused type

* Refactor to reduce code repetition and complexity

* Fix conflict

* Rename method

* Refactor some metrics

* Implement Dimensionality to ColumnLengthToBeBetween

* Implement Dimensionality for ColumnMedianToBeBetween in Pandas

* Implement Median Dimensionality for SQL

* Add database tests

* Fix median metric

* Implement Dimensionality SumToBeBetween

* Implement dimensionality for Column Values not In Set

* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex

* Implement NotNull and MissingCount dimensionality

* Implement columnValuesToBeBetween dimensionality

* Fix test

* Implement Pandas Dimensionality for ColumnValueStdDevToBeBetween

* Implement Dimensionality for ColumnValuesStdDevToBeBetween

* Fixed tests due to sqlite now supporting stddev

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-10 12:13:27 +01:00

77 lines
2.3 KiB
Python

import math
import sqlalchemy as sqa
from pytest import fixture
@fixture(scope="session", autouse=True)
def register_sqlite_math_functions():
"""
Register custom math functions for SQLite used in unit tests.
SQLite doesn't have built-in SQRT function, so we register Python's math.sqrt
to make it available for all SQLite connections in tests.
This runs automatically for all unit tests (autouse=True) and only once
per test session (scope="session").
"""
def safe_sqrt(x):
"""
Safe square root that handles floating-point precision issues.
When computing variance using AVG(x*x) - AVG(x)*AVG(x), floating-point
precision can result in slightly negative values (e.g., -1e-15) when
the true variance is zero. This function treats near-zero negative
values as zero, matching the behavior in stddev.py:254-256.
"""
if x is None:
return None
if x < 0:
if abs(x) < 1e-10:
return 0.0
raise ValueError(f"Cannot compute square root of negative number: {x}")
return math.sqrt(x)
@sqa.event.listens_for(sqa.engine.Engine, "connect")
def register_functions(dbapi_conn, connection_record):
if "sqlite" in str(type(dbapi_conn)):
dbapi_conn.create_function("SQRT", 1, safe_sqrt)
yield
# Clean up event listener after tests
sqa.event.remove(sqa.engine.Engine, "connect", register_functions)
def pytest_pycollect_makeitem(collector, name, obj):
try:
if obj.__name__ in ("TestSuiteSource", "TestSuiteInterfaceFactory"):
return []
if obj.__base__.__name__ in ("BaseModel", "Enum"):
return []
except AttributeError:
pass
def pytest_collection_modifyitems(session, config, items):
"""Reorder test items to ensure certain files run last."""
# List of test files that should run last
last_files = [
"test_dependency_injector.py",
# Add other files that should run last here
]
# Get all test items that should run last
last_items = []
other_items = []
for item in items:
if any(file in item.nodeid for file in last_files):
last_items.append(item)
else:
other_items.append(item)
# Reorder the items
items[:] = other_items + last_items