386 Commits

Author SHA1 Message Date
Himanshu Khairajani
79c3d55128
Fix #21679: Added metadata ingest-dbt CLI Command for Direct DBT Artifacts Ingestion (#21680)
* metadata dbt

* fix:
 - default path to current directory
 - addional warning and exception handling for missing metadata config vars

* test: add unit tests for DBT Ingestion CLI

* refactor

* PR review:
 - using Pydantic to parse and validate the openmetadata config in dbt's .yml
 - extended test-cases
 - giving user more configuration options for ingestion

* py refactoring

* add: dbt-auto ingest docs

* Improvements:
 - using environement variables for loading sensitve variables
 - added docs for auto dbt-ingestion for dbt-core
 - more test cases

* fix:
 - test case for reading JWT token inside the the method

* refactor: py code formatting

* refactor: py formatting

* ingest-dbt docs updated

* refined test cases

* Chore:
 - sonar vulnerability issue review
 - using existing URL class for host validation

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-06-19 17:57:10 +05:30
IceS2
49df5fc9de
MINOR: Implement dependency injection on ingestion (#21719)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Fix test, making the injection test run last

* Update connections.py

* Changed NewType to an AbstractClass to avoid linting issues

* remove comment

* Fix bug in service spec

* Update PyTest version to avoid importlib.reader wrong import
2025-06-16 08:03:38 +02:00
harshsoni2024
4a3b6f4934
issue-21370: db2 custom driver installation (#21638)
* db2 custom driver installation

* pylint changes

* typo fix
2025-06-09 19:52:35 +05:30
Elay Gelbart
dec346a84b
Fixes ISSUE 20899: upgrade google-cloud-secret-manager python requirement version (#20900)
* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3

* upgrade openmetadata-ingestion dependency google-cloud-secret-manager version to 2.23.3 with ~

* Bump up `mlflow` and `databricks-sdk` for protobuf 5.x.x, pin down google-cloud-secret-manager to 2.22.1 for airflow deps sync

* Pin down databricks-sdk to 0.20.0

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2025-06-06 03:14:25 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
harshsoni2024
176b731337
MINOR: presidio sample data lib fix (#21295) 2025-05-20 17:40:44 +05:30
Pere Menal-Ferrer
a7e2f33adc
feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-16 14:03:49 +02:00
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
Teddy
63a55437ae
GEN-1412: Implement load test logic (#19155)
* feat: implemented load test logic

* style: ran python linting

* fix: added locust dependency in test

* fix: skip locust in 3.8 as not supported

* fix: update gcsfs version

* fix: revert gcsfs versionning

* fix: fix gcsf version to 2023.10

* fix: dagster graphql and gx versions

* fix: dagster version to 1.8 for py8 compatibility

* fix: fix clickhouse to 0.2 as 0.3 requires SQA 2+

* fix: revert changes from main

* fix: revert changes compared to main
2025-04-24 16:08:38 +02:00
Teddy
209793f315
MINOR - Add support for GX 1.4 (#20934)
* fix: add support for GX 0.18.22 and GX 1.4.x

* fix: add  support for GX 0.18.22 and GX 1.4.x

* style: ran python linting

* fix: skip test if GX version is not installed
2025-04-24 11:55:04 +02:00
harshsoni2024
fb5af8ad7c
bigquery lib fix (#20849) 2025-04-16 08:04:26 +02:00
Imri Paran
a0d631b7cb
chore: more linient pinning for python neo4j (#20722) 2025-04-09 09:05:44 +00:00
Ayush Shah
76371e4a64
Enhance ingestion setup: Add dbt plugin to Playwright dependencies (#20605) 2025-04-03 19:11:33 +05:30
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Katarzyna Kałek
4ec2077bbc
Unpinned google-cloud-secret-manager version in ingestion dependencies (#19469)
* Unpinned google-cloud-secret-manager version in ingestion dependencies

* Restrict google-cloud-secret-manager version to <2.20.1 because of mlflow-skinny dependency issue

---------

Co-authored-by: Katarzyna Kałek <kkalek@olx.pl>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
2025-04-02 07:53:43 +02:00
Ayush Shah
eca3770a93
MINOR: Update Playwright integration test workflows to use 'playwright' deps (#20558) 2025-04-01 23:42:31 +05:30
Pere Miquel Brull
c08273b4ad
MINOR: Allow loading ometa from env (#20511) 2025-03-31 12:06:33 +02:00
Sriharsha Chintalapani
706cebd97a
Opensearch connector (#19698)
* Fix #19667: OpenSearch Connector

* Fix #19667: OpenSearch Connector

* do not ingest any system level indexes

* fix pyformat

* Add AWS auth

* Use common schema and fix ssl config in client

* Add openseach connector docs and update schema

* Remove api key auth type and complete docs checklist

* Remove unnecessary httpx dependency and pyformat

* Add compatible version of httpx for elasticsearch

* Fix pylint fails and py-tests validation error

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
2025-03-18 18:45:25 +05:30
Katarzyna Kałek
397dd0512f
Fixes #19619 (#19620)
* fixed s3 access bug for parquet files

* fixed formatting

* parsed endpoint_override to str in s3 parquet ingestion

---------

Co-authored-by: Katarzyna Kałek <kkalek@olx.pl>
2025-03-14 11:26:23 +05:30
Pere Miquel Brull
2b32808011
MINOR - Upgrade Airflow to 2.10.5 (#19840)
* MINOR - Bump Ingestion versions

* MINOR - Airflow bump

* MINOR - Set Airflow 2.10.5
2025-02-20 17:11:38 +01:00
Pere Miquel Brull
69c9102da1
MINOR - Bump Ingestion versions (#19836)
* MINOR - Bump Ingestion versions

* MINOR - Bump Ingestion versions

* fix

* fix db_scheme for airflow +2.9.1

* fix
2025-02-18 07:56:46 +01:00
Teddy
03de0ed549
MINOR: Added missing test dependencies (#19756)
* fix: added missing test dependencies

* style: ran python linting
2025-02-12 07:01:41 -08:00
Mayur Singal
76935f5c2e
MINOR: Fix teradata sqa version (#19677) 2025-02-05 16:00:17 +01:00
Suman Maharana
3e3c702942
Fix - switch to collate-dbt-artifacts-parser (#19647)
* Switch to collate-dbt-artifacts-parser
2025-02-04 11:57:39 +05:30
mgorsk1
fcf7072b12
🎉 Init (#19417) 2025-01-17 11:04:34 +05:30
Mayur Singal
8fdaea805f
MINOR: Kafka dependency conflict resolution (#19278) 2025-01-08 14:42:24 +05:30
Maciej Bryński
637e33fcae
Fixes 19217: Add ability to use confluent-kafka version greater than 2.1.1 (#19218)
* Add ability to use confluent-kafka version greater than 2.1.1

* fix: spotless
2025-01-06 09:01:22 -08:00
Mayur Singal
088cb64b7c
Fix #15952: Update SQLParse to Version 0.5 (#19224) 2025-01-06 11:39:34 +05:30
Akash Verma
39dcb5baef
Feature : Cockroach db connector (#18961) 2025-01-02 13:07:55 +05:30
Keshav Mohta
cde3a7dd1e
Feature: Cassandra Connector (#18943) 2024-12-12 15:12:55 +05:30
Mayur Singal
6d21dd12a4
MINOR: Snowflake UDF Lineage Support - main (#18886) 2024-12-05 00:19:40 +05:30
Imri Paran
c5171139c3
chore: added data diff to base requirements (#18789) 2024-11-26 17:28:22 +00:00
Suman Maharana
b220bdb891
Fix: mstr removed dependency issues (#18732)
* Fix: mstr removed dependency issues

* fix session still active error

* py_format

* fix tests

* Addressed Comments

* Addressed Comments

* addressed comments

* Addressed comments

* Add constants

* Fix pytests
2024-11-22 21:19:21 +05:30
Imri Paran
089fa785a8
build(setup-py): update pydantic version (#18541)
Update pydantic version to ">=2.7.0" in order to include `IncEx` that was introduced in 3d1355f168
2024-11-13 10:14:06 +01:00
Imri Paran
cdaa5c10af
[GEN-1996] feat(data-quality): use sampling config in data diff (#18532)
* feat(data-quality): use sampling config in data diff

- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes

* - reverted missing md5
- added missing database service type

* - use a custom substr sql function

* fixed nounce

* added failure for mssql with sampling because it requires a larger change in the data-diff library

* fixed unit tests

* updated range for sampling
2024-11-11 10:07:23 +01:00
Mayur Singal
66cf003cc3
MINOR: Fix pytest 3.11 taking 2hr (#18533) 2024-11-06 19:28:48 +05:30
Mayur Singal
f813ab730e
MINOR: Airflow dependency Fix (#18530) 2024-11-06 15:51:43 +05:30
Teddy
d579008c99
GEN 1683 - Add Column Value to be At Expected Location Test (#18524)
* feat: added column value to be in expected location test

* fix: renamed value -> values

* doc: added 1.6 documentatio entry

* style: ran python linting

* fix: move data packaging to pyproject.yaml

* fix: add init file back for data package

* fix: failing test case
2024-11-06 11:17:13 +01:00
Nicola Coretti
7ebc62dca7
feat: Add support for exasol datasource (#17166)
* Add flake.nix

* Add lockfile for flake

* Update nix environment and document usage

* Add schema for exasol connector

* Add Exasol definitions to databaseService

* Fix error in exasol connector schema

* Add additional connection options/settings to exasol connector

* Add exasol-connector to ui

* Add depdencies for exasol-connector

* Update notes

* Update ingestion code

* Add Basic Documentation for Exasol Connector

* Update flake file

* Add developer notes

* Add python script which can be used as entry point for debugging in ide

* Add config file which can be used for debugging (manual execution)

* Update debug script

* Update developer notes

* Remove old developer notes

* Add .venv to gitignore

* Update dev notes

* Update development notes

* Update ExasolSource

* Establish basic connection to Exasol DB from connector

* Update exasol connector connection settings

* Add service_spec for exasol plugin

* Remove development files

* Remove unused module

* Applied code formatter

* Update exasol dependency constraint(s)

* Add unit test for exasol connection url(s)

* Fixed test expectations for exasol connection url test(s)

* Adjust the test query for the Exasol connection test
2024-10-31 08:11:30 +01:00
Imri Paran
3c7f995677
MINOR: fix(looker): exclude version
https://github.com/looker-open-source/sdk-codegen/issues/1518
2024-10-11 18:47:43 +00:00
IceS2
02d9494e7f
MINOR: Update DB2 dependencies to fix issue about not having ibm-db installed (#18192)
* Update DB2 dependencies to fix issue about not having ibm-db installed

* Fix checkstyle
2024-10-09 18:39:29 +02:00
Prajwal214
bbd159b947
Minor: Updated pyiceberg version to 0.5.1 (#18155)
Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
2024-10-08 14:55:57 +05:30
sam-mccarty-mavenclinic
0dd3e97170
Fix 17911: Looker parsing improvements for liquid templating and view/model aliasing (#17912)
* Looker parsing improvements for liquid templating and view/model aliasing

* add python-liquid dependency to looker plugin requirements

* move to static method with 'openmetadata' context and add rendering tests

* remove backtick stripping

---------

Co-authored-by: Imri Paran <imri.paran@gmail.com>
2024-09-27 13:55:15 +02:00
Mayur Singal
e373fddbde
MINOR: MSTR connector import fix (#18025) 2024-09-27 16:57:09 +05:30
Imri Paran
25284e0232
MINOR: fix snowflake system metrics (#17989)
* fix snowflake system metrics

* format

* add link to logs and commit
fixed the dq cli test

* reverted bad formatting

* fixed models.py

* removed version pinning for data diff in tests
2024-09-26 11:55:17 +00:00
Imri Paran
64f8571a77
pin testcontainers by minor version (#17938) 2024-09-24 09:18:36 +02:00
Teddy
b222de66fe
Minor fix pylint version (#17944)
* fix import issue

* fix: fix pylint version

* style:ran python linting

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2024-09-20 18:50:28 +02:00