755 Commits

Author SHA1 Message Date
ulixius9
b960f87814 MINOR: Postgres Implement schema fallback 2025-06-19 15:38:06 +05:30
Suman Maharana
7be62f3ed9
Add: Tableau Hierarchy project filter (#21811) 2025-06-19 11:18:52 +05:30
IceS2
040a33117c
MINOR: Fix Profiler Infinite Loop (#21843) 2025-06-19 10:33:45 +05:30
harshsoni2024
d38ee0ed52
feat-21712: PowerBI internal entities & cross workspace lineage (#21837) 2025-06-18 20:46:17 +05:30
Keshav Mohta
7c0eeef049
Fixes #19692: Implemented Nifi Pipeline Lineage (#21802)
* feat: implemented nifi pipeline lineage

* test: implemented tests for nifi pipeline lineage

* fix: yield_pipeline_bulk_lineage_details output type hinting

* fix: component check in connections

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-06-18 07:31:04 +00:00
harshsoni2024
a09a696358
MINOR: Tableau proxy url for sourceurl (#21799) 2025-06-18 10:52:08 +05:30
IceS2
e79c54e6a5
MINOR: Add injection to profiler (#21738)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Add Injection to Metrics in Profiler

* Add Injection to the Profiler

* Fix UnitTests

* Fix Pytests

* Fix Tests

* Fix types
2025-06-17 19:01:00 +02:00
IceS2
49df5fc9de
MINOR: Implement dependency injection on ingestion (#21719)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Fix test, making the injection test run last

* Update connections.py

* Changed NewType to an AbstractClass to avoid linting issues

* remove comment

* Fix bug in service spec

* Update PyTest version to avoid importlib.reader wrong import
2025-06-16 08:03:38 +02:00
Pere Menal-Ferrer
44e09e41a2
Revert "FIX #1464 (#21520)" (#21726)
This reverts commit 1e86f9870fd663122b9bbb64f3cf17cf32619c7f.
2025-06-13 17:27:32 +02:00
IceS2
891ff4184d
MINOR: Initial implementation for our Connection Class (#21581)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Fix Test

* Fix Profile Test Connection

* Remove unit test

* Remove comment

* Fix tests and missing changes
2025-06-13 14:52:29 +02:00
harshsoni2024
6a6180b2e3
powerbi change owner condition (#21724) 2025-06-12 16:11:43 +05:30
Suman Maharana
18f9f2cdb6
Fix: Tableau project id should always be a string (#21700) 2025-06-12 11:21:53 +05:30
Suman Maharana
0df058a53d
Fix: dbtcloud CL errors (#21685) 2025-06-10 21:45:07 +05:30
Pere Menal-Ferrer
1e86f9870f
FIX #1464 (#21520)
* Add PIICategoryTags and some utilities on top of them.

* Fix static-check

* Add test for fqn representation

* Add NEREntityGeneralTags.json from Collate

* Add test to check PIICategoryTags agree with the ones used by OM server

* Add LabelExtractor

* Fix style

* Add ignore superflous-parens for pylint

* Ass comment as per PR review

* Fix not-updated PII-IT

* Remove duplicated IT test for PII

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-09 16:05:35 -07:00
Keshav Mohta
b7a7023890
Fix #20665: BigQuery - Adding billing project (#21231) 2025-06-09 13:09:40 +05:30
IceS2
5b20b84546
MINOR: Add logic to handle WorkflowContext on Ingestion (#21425)
* Add logic to handle WorkflowContext on Ingestion

* Revert base.py changes

* Removed comment

* Fix basedpyright complaints

* Make ContextManager automatically add its context to the PipelineStatus

* Small changes
2025-06-03 17:35:08 +02:00
Suman Maharana
720c6d3f9f
Add: Looker explore to view Column Lineage (#21504)
* Add: explore to view Column Lineage

* Add tags ingestion and fix cll warnings

* lint

* Addressed comments

* fixed tests
2025-06-03 20:23:43 +05:30
Pere Menal-Ferrer
6683c632f4
FIX #21464 (#21463)
* Reproduce failing behaviour with non-date-time data

* Add a presidio patch for DateTimes

* Fix type-check error

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-30 08:18:50 +02:00
harshsoni2024
8bbc4d8c3d
MINOR: PBI dataset expressions empty value fix (#21409) 2025-05-27 16:50:55 +05:30
Pere Menal-Ferrer
ca812852d6
ci/nox-setup-testing (#21377)
* Make pytest to user code from src rather than from install package

* Fix test_amundsen: missing None

* Update pytest configuration to use importlib mode

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Update coverage source path

* Fix wrong import.

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Modify dependencies in dev environment.

* Ignore all airflow tests

* Remove coverage in unit_ingestion_dev_env. Revert coverage source to prevent broken CI.

* Add nox for running unit test

* FIx PowerBI integration test to use pathlib for resource paths and not os.getcwd to prevent failures when not executed from the right path

* Move test_helpers.py to unit test, as it is not an integration test.

* Remove utils empty folder in integration tests

* Refactor testcontainers configuration to avoid pitfalls with max_tries setting

* Add nox unit testing basic setup

* Add format check session

* Refactor nox-unit and add plugins tests

* Add GHA for py-nox-ci

* Add comment to GHA

* Restore conftest.py file

* Clarify comment

* Simplify function

* Fix matrix startegy and nox mismatch

* Improve python version strategy with nox and GHA

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-27 10:56:52 +02:00
Pere Menal-Ferrer
6ea630d7ef
DevEx: Ingestion development improvement (focus on unit testing) (#21362)
* Fix test_amundsen: missing None

* Fix custom_basemodel_validation to check model_fields on type(values) to prevent noisy warnings

* Refactor referencedByQueries validation to use field_validator as per deprecation warning

* Update ColumnJson to use model_rebuild rather as replacement for forward reference updates as per deprecation warning

* Move superset test to integration test as they are using testcontainers

* Add install_dev_env target to Makefile for development dependencies

* Add test-unit as extra in setup.py

* Skip failing IT test. Requires further investigation.
2025-05-26 10:38:17 +02:00
Teddy
7ab6755beb
ISSUE #21101 - Implement BQ Partitioned Tests (#21348)
* feat: add query logger as an event listent in debug mode

* fix: added ingestion.src plugin to pylint

* minor: add partition sampled table

* test: added test for partitioned BQ table

* Remove log_query function from logger.py

* style: ran python linting
2025-05-22 17:22:05 +02:00
Pere Menal-Ferrer
3c6c762d9c
fix/indian-passport-detection (#21311)
* Remove 'ORGANIZATION' PII Tag as it is no longer supported by our PII detectors.

* Updata presidio version to fix wrong regex for indian passport

* Increase sample size of Indian passport numbers

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-20 15:32:21 +02:00
Pere Menal-Ferrer
5d2dfa712a
feature/pii-processor-improvement (#21248)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add tests for PAN, NIF, SSN entities

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Add Generated to State enum

* Integrate PIISensitive classifier to PIIProcessor
2025-05-19 17:52:17 +00:00
harshsoni2024
a414e93163
bugfix: powerbi lineage source parameter usecase fix, last active user ingestion fix (#21272) 2025-05-19 19:21:02 +05:30
Mohit Tilala
4c0ce77756
Fix airbyte pipeline lineage extraction (#21151) 2025-05-19 10:14:33 +05:30
Mayur Singal
703118f2b5
MINOR: Disable Flaky superset tests (#21242) 2025-05-18 23:12:42 +05:30
Pere Menal-Ferrer
a7e2f33adc
feature/pii-column-classifier (#21200)
* Add PII Tag and Sensitivity Level enums.

* Add feature-extraction for PII classification tasks

* Add faker as test dependency

* Add unit tests for presidio tag extractor

* Add PIISensitivityTags enum and update sensitivity mapping logic

* Add Presidio utility functions for PII analysis

* Extend column name regexs for PII

* Add colum name split

* Move pii algorithms to dedicated package

* Add tests for PAN, NIF, SSN entities

* Fix linting

* Add comment on why we need to set specific lanaguage to Presidio recognizers, as per PR suggestion.

* Fix version of faker to prevent flaky tests. Fix failing tests.

* Fix wrong import

---------

Co-authored-by: Pere Menal <pere.menal@getcollate.io>
2025-05-16 14:03:49 +02:00
Suman Maharana
f81ee52ec4
Chore Ingestion Tableau library change (#21076) 2025-05-15 17:48:39 +05:30
harshsoni2024
3b382c1bd9
issue-20737: datalake parquet different extensions (#21048) 2025-05-13 11:23:46 +05:30
Teddy
a853561d30
MINOR: data sample ingestion bigquery (#21074)
* fix: data sample ingestion bigquery

* style: ran python linting

* fix: flaky test in topology
2025-05-06 15:58:37 +02:00
Mayur Singal
9755662240
Fix #20902: Fix duplicate constraints error (#21037) 2025-04-30 11:34:35 +05:30
Teddy
209793f315
MINOR - Add support for GX 1.4 (#20934)
* fix: add support for GX 0.18.22 and GX 1.4.x

* fix: add  support for GX 0.18.22 and GX 1.4.x

* style: ran python linting

* fix: skip test if GX version is not installed
2025-04-24 11:55:04 +02:00
chrisrayrayne
b14f83940a
Fixes Issue 20189: REST connector checks updated (#20736) 2025-04-15 10:24:57 +05:30
Mayur Singal
ee5d8eee8b
Revert "MINOR: Implement Column Validation in Lineage (#20544)" (#20658) 2025-04-07 17:13:35 +05:30
Keshav Mohta
0796c6274b
Fixes: Databricks httpPath Required (#20611)
* fix: made databricks httpPath required and added a migration file for the same

* fix: added sql migration in postDataMigration file and fix databricks tests

* fix: added httpPath in test_source_connection.py and test_source_parsing.py files

* fix: added httpPath in test_databricks_lineage.py

* fix: table name in postgres migration
2025-04-07 13:33:55 +05:30
harshsoni2024
7953f98097
issue-20546: REST connector enhancements (#20634) 2025-04-07 10:22:45 +05:30
Suman Maharana
5275975d31
Fix: dbt cloud latest run execution (#20573)
* Fix: dbt cloud latest run execution

* update latest run id

* set default to 100
2025-04-03 11:13:17 +05:30
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Mayur Singal
7991715135
MINOR: Implement Column Validation in Lineage (#20544) 2025-04-02 17:40:40 +05:30
harshsoni2024
f267d4ef01
issue-20519: Support PowerBI Owners ingestion (#20525) 2025-04-02 16:11:27 +05:30
Mohit Tilala
06ab82170b
Fixes #19534: Snowflake stream ingestion support (#20278) 2025-04-01 13:02:37 +05:30
Mohit Tilala
7ad97afa62
Fixes #19690: Add QlikCloud dashboard filter by space name type (#20315) 2025-04-01 13:00:50 +05:30
Ayush Shah
653c878497
MINOR: Transform Reserved keywords like quotes to OM compatible (#20459) 2025-03-27 13:02:07 +05:30
Sriharsha Chintalapani
706cebd97a
Opensearch connector (#19698)
* Fix #19667: OpenSearch Connector

* Fix #19667: OpenSearch Connector

* do not ingest any system level indexes

* fix pyformat

* Add AWS auth

* Use common schema and fix ssl config in client

* Add openseach connector docs and update schema

* Remove api key auth type and complete docs checklist

* Remove unnecessary httpx dependency and pyformat

* Add compatible version of httpx for elasticsearch

* Fix pylint fails and py-tests validation error

---------

Co-authored-by: Mohit Tilala <tilalamohit123@gmail.com>
Co-authored-by: Mohit Tilala <63147650+mohittilala@users.noreply.github.com>
2025-03-18 18:45:25 +05:30
Akash Verma
cf7a442e32
Fixes #19891 : Added measures in powerbi (#19990) 2025-03-17 14:43:22 +05:30
Mayur Singal
d30fd90096
Minor: Query Cost Table Aggregation Endpoint (#20270) 2025-03-17 11:33:50 +05:30
Mayur Singal
581ab6ce71
MINOR: Fix pytests - usage count (#20247) 2025-03-14 09:07:40 +01:00
harshsoni2024
826279608f
issue-19892: parse powerbi table source (#20141) 2025-03-12 12:59:29 +05:30
Pere Miquel Brull
2e7c9a0875
FIX #19765 - Improve Column Name Scanner (#20136) 2025-03-07 14:32:59 +01:00