* metadata dbt
* fix:
- default path to current directory
- addional warning and exception handling for missing metadata config vars
* test: add unit tests for DBT Ingestion CLI
* refactor
* PR review:
- using Pydantic to parse and validate the openmetadata config in dbt's .yml
- extended test-cases
- giving user more configuration options for ingestion
* py refactoring
* add: dbt-auto ingest docs
* Improvements:
- using environement variables for loading sensitve variables
- added docs for auto dbt-ingestion for dbt-core
- more test cases
* fix:
- test case for reading JWT token inside the the method
* refactor: py code formatting
* refactor: py formatting
* ingest-dbt docs updated
* refined test cases
* Chore:
- sonar vulnerability issue review
- using existing URL class for host validation
---------
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
* Initial implementation for our Connection Class
* Implement the Initial Connection class
* Add Unit Tests
* Implement Dependency Injection for the Ingestion Framework
* Fix Test
* Fix Profile Test Connection
* Fix test, making the injection test run last
* Update connections.py
* Changed NewType to an AbstractClass to avoid linting issues
* remove comment
* Fix bug in service spec
* Update PyTest version to avoid importlib.reader wrong import
* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3
* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3 with ~
* Bump up `mlflow` and `databricks-sdk` for protobuf 5.x.x, pin down google-cloud-secret-manager to 2.22.1 for airflow deps sync
* Pin down databricks-sdk to 0.20.0
---------
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
* Make pytest to user code from src rather than from install package
* Fix test_amundsen: missing None
* Update pytest configuration to use importlib mode
* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings
* Refactor referencedByQueries validation to use field_validator as per deprecation warning
* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning
* Move superset test to integration test as they are using testcontainers
* Update coverage source path
* Fix wrong import.
* Add install_dev_env target to Makefile for development dependencies
* Add test-unit as extra in setup.py
* Modify dependencies in dev environment.
* Ignore all airflow tests
* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.
* Add nox for running unit test
* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path
* Move test_helpers.py to unit test, as it is not an integration test.
* Remove utils empty folder in integration tests
* Refactor testcontainers configuration to avoid pitfalls with max_tries setting
* Add nox unit testing basic setup
* Add format check session
* Refactor nox-unit and add plugins tests
* Add GHA for py-nox-ci
* Add comment to GHA
* Restore conftest.py file
* Clarify comment
* Simplify function
* Fix matrix startegy and nox mismatch
* Improve python version strategy with nox and GHA
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* Fix test_amundsen: missing None
* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings
* Refactor referencedByQueries validation to use field_validator as per deprecation warning
* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning
* Move superset test to integration test as they are using testcontainers
* Add install_dev_env target to Makefile for development dependencies
* Add test-unit as extra in setup.py
* Skip failing IT test. Requires further investigation.
* feat: add query logger as an event listent in debug mode
* fix: added ingestion.src plugin to pylint
* minor: add partition sampled table
* test: added test for partitioned BQ table
* Remove log_query function from logger.py
* style: ran python linting
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.
* Updata presidio version to fix wrong regex for indian passport
* Increase sample size of Indian passport numbers
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* Add PII Tag and Sensitivity Level enums.
* Add feature-extraction for PII classification tasks
* Add faker as test dependency
* Add unit tests for presidio tag extractor
* Add PIISensitivityTags enum and update sensitivity mapping logic
* Add Presidio utility functions for PII analysis
* Extend column name regexs for PII
* Add colum name split
* Move pii algorithms to dedicated package
* Add tests for PAN, NIF, SSN entities
* Fix linting
* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.
* Fix version of faker to prevent flaky tests. Fix failing tests.
* Fix wrong import
---------
Co-authored-by: Pere Menal <pere.menal@getcollate.io>
* feat: implemented load test logic
* style: ran python linting
* fix: added locust dependency in test
* fix: skip locust in 3.8 as not supported
* fix: update gcsfs version
* fix: revert gcsfs versionning
* fix: fix gcsf version to 2023.10
* fix: dagster graphql and gx versions
* fix: dagster version to 1.8 for py8 compatibility
* fix: fix clickhouse to 0.2 as 0.3 requires SQA 2+
* fix: revert changes from main
* fix: revert changes compared to main
* fix: add support for GX 0.18.22 and GX 1.4.x
* fix: add support for GX 0.18.22 and GX 1.4.x
* style: ran python linting
* fix: skip test if GX version is not installed
* Unpinned google-cloud-secret-manager version in ingestion dependencies
* Restrict google-cloud-secret-manager version to <2.20.1 because of mlflow-skinny dependency issue
---------
Co-authored-by: Katarzyna Kałek <kkalek@olx.pl>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
* Fix#19667: OpenSearch Connector
* Fix#19667: OpenSearch Connector
* do not ingest any system level indexes
* fix pyformat
* Add AWS auth
* Use common schema and fix ssl config in client
* Add openseach connector docs and update schema
* Remove api key auth type and complete docs checklist
* Remove unnecessary httpx dependency and pyformat
* Add compatible version of httpx for elasticsearch
* Fix pylint fails and py-tests validation error
---------
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
* feat(data-quality): use sampling config in data diff
- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes
* - reverted missing md5
- added missing database service type
* - use a custom substr sql function
* fixed nounce
* added failure for mssql with sampling because it requires a larger change in the data-diff library
* fixed unit tests
* updated range for sampling
* feat: added column value to be in expected location test
* fix: renamed value -> values
* doc: added 1.6 documentatio entry
* style: ran python linting
* fix: move data packaging to pyproject.yaml
* fix: add init file back for data package
* fix: failing test case
* Add flake.nix
* Add lockfile for flake
* Update nix environment and document usage
* Add schema for exasol connector
* Add Exasol definitions to databaseService
* Fix error in exasol connector schema
* Add additional connection options/settings to exasol connector
* Add exasol-connector to ui
* Add depdencies for exasol-connector
* Update notes
* Update ingestion code
* Add Basic Documentation for Exasol Connector
* Update flake file
* Add developer notes
* Add python script which can be used as entry point for debugging in ide
* Add config file which can be used for debugging (manual execution)
* Update debug script
* Update developer notes
* Remove old developer notes
* Add .venv to gitignore
* Update dev notes
* Update development notes
* Update ExasolSource
* Establish basic connection to Exasol DB from connector
* Update exasol connector connection settings
* Add service_spec for exasol plugin
* Remove development files
* Remove unused module
* Applied code formatter
* Update exasol dependency constraint(s)
* Add unit test for exasol connection url(s)
* Fixed test expectations for exasol connection url test(s)
* Adjust the test query for the Exasol connection test
* fix snowflake system metrics
* format
* add link to logs and commit
fixed the dq cli test
* reverted bad formatting
* fixed models.py
* removed version pinning for data diff in tests