* feat: add support for DBX system metrics
* feat: add support for DBX system metrics
* fix: added WRITE back
* fix: failing test cases
* fix: failing test
* Initial implementation for our Connection Class
* Implement the Initial Connection class
* Add Unit Tests
* Implement Dependency Injection for the Ingestion Framework
* Fix Test
* Fix Profile Test Connection
* Add Injection to Metrics in Profiler
* Add Injection to the Profiler
* Fix UnitTests
* Fix Pytests
* Fix Tests
* Fix types
* Make pytest to user code from src rather than from install package
* Fix test_amundsen: missing None
* Update pytest configuration to use importlib mode
* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings
* Refactor referencedByQueries validation to use field_validator as per deprecation warning
* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning
* Move superset test to integration test as they are using testcontainers
* Update coverage source path
* Fix wrong import.
* Add install_dev_env target to Makefile for development dependencies
* Add test-unit as extra in setup.py
* Modify dependencies in dev environment.
* Ignore all airflow tests
* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.
* Add nox for running unit test
* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path
* Move test_helpers.py to unit test, as it is not an integration test.
* Remove utils empty folder in integration tests
* Refactor testcontainers configuration to avoid pitfalls with max_tries setting
* Add nox unit testing basic setup
* Add format check session
* Refactor nox-unit and add plugins tests
* Add GHA for py-nox-ci
* Add comment to GHA
* Restore conftest.py file
* Clarify comment
* Simplify function
* Fix matrix startegy and nox mismatch
* Improve python version strategy with nox and GHA
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* fix: wrong attribute name in SampleConfig model
* fix: test attribute
* fix: failing tests
* fix: trino filter error + adjust test to take into account null value
* fix: mssql and azuresql tablesample on views
* fix: sqa table reference
* style: ran python linting
* fix: added raw dataset to query runner
* fix: get table and schema name from orm object
* fix: get table level config for table tests
* ref(data-quality): modularized test case validator import
- removed test_suite_factory
- implemented TestCaseImporter
- removed SQAValidatorBuilder and PandasValidatorBuilder in favor of a SourceType enum
- removed the orm table creation from test suite source
* format
* IValidatorBuilder -> ValidatorBuilder
* use the table from the sampler in the test suite interface
* linting
* fixed the profiler with similar solution
* removed unused inheritance
* removed unneeded super().__init__()
* removed all instances of orm_table
* fixed tests
* add reportExplicitAny=false
* fixed tests
* ref(profiler): use di for system profile
- use source classes that can be overridden in system profiles
- use a manifest class instead of factory to specify which class to resolve for connectors
- example usage can be seen in redshift and snowflake
* - added manifests for all custom profilers
- used super() dependency injection in order for system metrics source
- formatting
* - implement spec for all source types
- added docs for the new specification
- added some pylint ignores in the importer module
* remove TYPE_CHECKING in core.py
* - deleted valuedispatch function
- deleted get_system_metrics_by_dialect
- implemented BigQueryProfiler with a system metrics source
- moved import_source_class to BaseSpec
* - removed tests related to the profiler factory
* - reverted start_time
- removed DML_STAT_TO_DML_STATEMENT_MAPPING
- removed unused logger
* - reverted start_time
- removed DML_STAT_TO_DML_STATEMENT_MAPPING
- removed unused logger
* fixed tests
* format
* bigquery system profile e2e tests
* fixed module docstring
* - removed import_side_effects from redshift. we still use it in postgres for the orm conversion maps.
- removed leftover methods
* - tests for BaseSpec
- moved get_class_path to importer
* - moved constructors around to get rid of useless kwargs
* - changed test_system_metric
* - added linage and usage to service_spec
- fixed postgres native lineage test
* add comments on collaborative constructors
* ref(profiler): redshift system metrics
- moved redshift system metrics to the redshift source module
- use Timestamp in data quality
- added plugin feature to test utils
* use timezone.utc
* format
* reverted unintended snowflake changes
* fixed import test_system_metrics.py
* revert
* fixed import in tests
* tests(datalake): use minio
1. use minio instead of moto for mimicking s3 behavior.
2. removed moto dependency as it is not compatible with aiobotocore (https://github.com/getmoto/moto/issues/7070#issuecomment-1828484982)
* - moved test_datalake_profiler_e2e.py to datalake/test_profiler
- use minio instead of moto
* fixed tests
* fixed tests
* removed default name for minio container
* fix(profiler): snowflake
resolve tables using the snowflake engine instead of OpenMetadata
* added env for cleaning up dbs in E2E
* moved system metric method to profiler. all the rest says in snowflake
* format
* revert unnecessary changes
* removed test for previous resolution method
* use shutdown39
* fix: Allow non numeric numbers to be sent via Json, Replace NaN values with None in SQAProfilerInterface
Replace NaN values with None in the SQAProfilerInterface class to maintain database parity. NaN values will be cast to null in OpenMetadata. This change ensures that data handling processes account for this conversion.
* fix: histogram overflow error
* test: Add Unit Test for Null and Null Ratio Metric
* chore: Address comments
* chore: Address comments
* fix: checkstyle and message
* fix: failing tests as null count works as expected
* feat: add global metric configuration for the profiler
* style: ran java linting
* fix: renamed disable to disabled
* style: ran java linting
* feat: ometa sdk for profiler setting
* test: ingestion profiler global config tests
* fix: update metric name to use MetricType Enum
* fix: allow bot to retrieve settings
* fix: exclude GX artifacts
* feat: implement global profiler setting logic for ingestion side
* fix: exclude metrics if Metric is empty
* style: ran python linting
* style: ran python linting
* fix: skip empty metrics
* style: ran python linting
* fix: moved GET profiler config to seperate endpoint in system resource
* fix: moved compute metric filter to MetricFilter + renamed container
* fix: test failures
* fix: profiler test case
* linting: fix python linting
* fix: get column types from parquet schema for parquet files
* style: python linting
* fix: remove displayType check in test as variation depending on OS
* fix: limit sampling to specific column
* fix: handle bigquery struct columns
* fix: default partition to 1 DAY for BQ
* fix: default to __TABLES__ for BQ table metrics
* style: ran python linting
* style: fix linting
* fix: python style
* fix: set partition to DAY if not HOUR
* feat: add backend support for custom metrics
* feat: fix python test
* feat: support custom metrics computation
* feat: updated tests for custom metrics
* feat: added dl support for min max of datetime
* feat: added is safe query check for query sampler
* feat: added support for custom metric computation in dl
* feat: added explicit addProper for pydantic model import fo Extra
* feat: added custom metric to returned obj
* feat: wrapped trino import in __init__
* feat: fix python linting
* feat: fix typing in 3.8