* Add logic to handle WorkflowContext on Ingestion
* Revert base.py changes
* Removed comment
* Fix basedpyright complaints
* Make ContextManager automatically add its context to the PipelineStatus
* Small changes
(cherry picked from commit 5b20b845462cfcb568b92dbf22e160226433fae5)
* Reproduce failing behaviour with non-date-time data
* Add a presidio patch for DateTimes
* Fix type-check error
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
(cherry picked from commit 6683c632f42b55a16db9df8acf5fd96b586ce301)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.
* Updata presidio version to fix wrong regex for indian passport
* Increase sample size of Indian passport numbers
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
(cherry picked from commit 3c6c762d9c0d7036124aae3a4dc90f51d6a674c0)
* Add PII Tag and Sensitivity Level enums.
* Add feature-extraction for PII classification tasks
* Add faker as test dependency
* Add unit tests for presidio tag extractor
* Add PIISensitivityTags enum and update sensitivity mapping logic
* Add Presidio utility functions for PII analysis
* Extend column name regexs for PII
* Add tests for PAN, NIF, SSN entities
* Fix version of faker to prevent flaky tests. Fix failing tests.
* Add Generated to State enum
* Integrate PIISensitive classifier to PIIProcessor
* Add PII Tag and Sensitivity Level enums.
* Add feature-extraction for PII classification tasks
* Add faker as test dependency
* Add unit tests for presidio tag extractor
* Add PIISensitivityTags enum and update sensitivity mapping logic
* Add Presidio utility functions for PII analysis
* Extend column name regexs for PII
* Add colum name split
* Move pii algorithms to dedicated package
* Add tests for PAN, NIF, SSN entities
* Fix linting
* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.
* Fix version of faker to prevent flaky tests. Fix failing tests.
* Fix wrong import
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* fix: properly close connection on sampler ingestion
* fix: dangling connection test
* style: ran python linting
* fix: revert to 9
(cherry picked from commit cd6434dd73cd7c60ef22a11972740f12686b3558)
* fix: data sample ingestion bigquery
* style: ran python linting
* fix: flaky test in topology
(cherry picked from commit a853561d30fcab34796a06c0e75ec0bb4f20e1f4)
* fix: made databricks httpPath required and added a migration file for the same
* fix: added sql migration in postDataMigration file and fix databricks tests
* fix: added httpPath in test_source_connection.py and test_source_parsing.py files
* fix: added httpPath in test_databricks_lineage.py
* fix: table name in postgres migration
* Fix#19667: OpenSearch Connector
* Fix#19667: OpenSearch Connector
* do not ingest any system level indexes
* fix pyformat
* Add AWS auth
* Use common schema and fix ssl config in client
* Add openseach connector docs and update schema
* Remove api key auth type and complete docs checklist
* Remove unnecessary httpx dependency and pyformat
* Add compatible version of httpx for elasticsearch
* Fix pylint fails and py-tests validation error
---------
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>