* Initial implementation for our Connection Class
* Implement the Initial Connection class
* Add Unit Tests
* Implement Dependency Injection for the Ingestion Framework
* Fix Test
* Fix Profile Test Connection
* Add Injection to Metrics in Profiler
* Add Injection to the Profiler
* Fix UnitTests
* Fix Pytests
* Fix Tests
* Fix types
* Initial implementation for our Connection Class
* Implement the Initial Connection class
* Add Unit Tests
* Implement Dependency Injection for the Ingestion Framework
* Fix Test
* Fix Profile Test Connection
* Fix test, making the injection test run last
* Update connections.py
* Changed NewType to an AbstractClass to avoid linting issues
* remove comment
* Fix bug in service spec
* Update PyTest version to avoid importlib.reader wrong import
* Initial implementation for our Connection Class
* Implement the Initial Connection class
* Add Unit Tests
* Fix Test
* Fix Profile Test Connection
* Remove unit test
* Remove comment
* Fix tests and missing changes
* refactor: removed testSuite field from CreateTestCase
BREAKING CHANGE: when creating a test case, testsuite is now derived from entityLink (fetch or created)
* feat: allow setting tags when creating a test case
* style: ran linters
* fix: compiling error
* fix: failing test case
* fix: failing tests
* removed testSuite from required filed
* fixed ui side
* style: ran java linting
* deprecation: remove testSuite param from ingestion
* fix: remove test suite filed
* fix: remove test_suite field
---------
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
* Add PIICategoryTags and some utilities on top of them.
* Fix static-check
* Add test for fqn representation
* Add NEREntityGeneralTags.json from Collate
* Add test to check PIICategoryTags agree with the ones used by OM server
* Add LabelExtractor
* Fix style
* Add ignore superflous-parens for pylint
* Ass comment as per PR review
* Fix not-updated PII-IT
* Remove duplicated IT test for PII
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
* deprecation: remove testCaseResults endpoint from testCaseResource
* fix: path in test e2e test
* fix: endpoint name to testCaseResults
* style: fix java linting
* Add logic to handle WorkflowContext on Ingestion
* Revert base.py changes
* Removed comment
* Fix basedpyright complaints
* Make ContextManager automatically add its context to the PipelineStatus
* Small changes
* Reproduce failing behaviour with non-date-time data
* Add a presidio patch for DateTimes
* Fix type-check error
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* Make pytest to user code from src rather than from install package
* Fix test_amundsen: missing None
* Update pytest configuration to use importlib mode
* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings
* Refactor referencedByQueries validation to use field_validator as per deprecation warning
* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning
* Move superset test to integration test as they are using testcontainers
* Update coverage source path
* Fix wrong import.
* Add install_dev_env target to Makefile for development dependencies
* Add test-unit as extra in setup.py
* Modify dependencies in dev environment.
* Ignore all airflow tests
* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.
* Add nox for running unit test
* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path
* Move test_helpers.py to unit test, as it is not an integration test.
* Remove utils empty folder in integration tests
* Refactor testcontainers configuration to avoid pitfalls with max_tries setting
* Add nox unit testing basic setup
* Add format check session
* Refactor nox-unit and add plugins tests
* Add GHA for py-nox-ci
* Add comment to GHA
* Restore conftest.py file
* Clarify comment
* Simplify function
* Fix matrix startegy and nox mismatch
* Improve python version strategy with nox and GHA
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* Fix test_amundsen: missing None
* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings
* Refactor referencedByQueries validation to use field_validator as per deprecation warning
* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning
* Move superset test to integration test as they are using testcontainers
* Add install_dev_env target to Makefile for development dependencies
* Add test-unit as extra in setup.py
* Skip failing IT test. Requires further investigation.
* feat: add query logger as an event listent in debug mode
* fix: added ingestion.src plugin to pylint
* minor: add partition sampled table
* test: added test for partitioned BQ table
* Remove log_query function from logger.py
* style: ran python linting
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.
* Updata presidio version to fix wrong regex for indian passport
* Increase sample size of Indian passport numbers
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* Add PII Tag and Sensitivity Level enums.
* Add feature-extraction for PII classification tasks
* Add faker as test dependency
* Add unit tests for presidio tag extractor
* Add PIISensitivityTags enum and update sensitivity mapping logic
* Add Presidio utility functions for PII analysis
* Extend column name regexs for PII
* Add tests for PAN, NIF, SSN entities
* Fix version of faker to prevent flaky tests. Fix failing tests.
* Add Generated to State enum
* Integrate PIISensitive classifier to PIIProcessor
* Add PII Tag and Sensitivity Level enums.
* Add feature-extraction for PII classification tasks
* Add faker as test dependency
* Add unit tests for presidio tag extractor
* Add PIISensitivityTags enum and update sensitivity mapping logic
* Add Presidio utility functions for PII analysis
* Extend column name regexs for PII
* Add colum name split
* Move pii algorithms to dedicated package
* Add tests for PAN, NIF, SSN entities
* Fix linting
* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.
* Fix version of faker to prevent flaky tests. Fix failing tests.
* Fix wrong import
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* feat: implemented load test logic
* style: ran python linting
* fix: added locust dependency in test
* fix: skip locust in 3.8 as not supported
* fix: update gcsfs version
* fix: revert gcsfs versionning
* fix: fix gcsf version to 2023.10
* fix: dagster graphql and gx versions
* fix: dagster version to 1.8 for py8 compatibility
* fix: fix clickhouse to 0.2 as 0.3 requires SQA 2+
* fix: revert changes from main
* fix: revert changes compared to main
* fix: add support for GX 0.18.22 and GX 1.4.x
* fix: add support for GX 0.18.22 and GX 1.4.x
* style: ran python linting
* fix: skip test if GX version is not installed