3605 Commits

Author SHA1 Message Date
Keshav Mohta
abe7ddbc13
fix: added database filter during test connection in snowflake (#24000) 2025-10-29 17:13:00 +05:30
Ayush Shah
608d43c16b
fix: handle empty buckets in GCS connection tests (#24048) 2025-10-29 16:45:54 +05:30
harshsoni2024
9f5f8d5c13
MINOR: pbi source url fix (#24058) 2025-10-29 16:33:05 +05:30
Keshav Mohta
34adc96fea
fix: added unstructuredFormats in MetadataEntry objects creation (#24023) 2025-10-29 11:09:18 +01:00
IceS2
5a7d7158a5
Feature/dimensionality column mean to be between (#23984)
* Initial implementation for Dimensionality on Data Quality Tests

* Fix ColumnValuesToBeUnique and create TestCaseResult API

* Refactor dimension result

* Initial E2E Implementation without Impact Score

* Dimensionality Thin Slice

* Update generated TypeScript types

* Update generated TypeScript types

* Removed useless method to use the one we already had

* Fix Pandas Dimensionality checks

* Remove useless comments

* Implement PR comments, fix Tests

* Improve the code a bit

* Fix imports

* Implement Dimensionality for ColumnMeanToBeBetween

* Removed useless comments and improved minor things

* Implement UnitTests

* Fixes

* Moved import pandas to type checking

* Fix Min/Max being optional

* Fix Unittests

* small fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-27 18:05:51 +01:00
harshsoni2024
4ec5059a32
Fix: looker local repo config (#24003) 2025-10-27 20:21:11 +05:30
Keshav Mohta
ba5f8e5bf6
fix: add field label as displayName for table and columns (#23993) 2025-10-27 17:08:02 +05:30
Suman Maharana
877104aa22
Fix: UC Ingestion failing due to non-selected tables (#23954) 2025-10-27 11:21:17 +05:30
veerasai06
9026ac3add
Fixed small typo in comments: changed 'metdata' to 'metadata' (#24014) 2025-10-26 20:47:00 -07:00
Yourton Ma
07012db685
Fixes #22392: Add to herarchical owner config for database ingestion (#23709)
* feat: add owner assignment support at metadata ingestion level

* docs: Translate comments to English in test_owner

* refactor: move the test_owner-related files into correct positions

* feat: Add support for more source types

* Revert "feat: Add support for more source types"

This reverts commit a7649dcb3204cf98b7f4f9be02fbb982d2532193.

* feat: Add owner field support in sourceConfig for Database and Dashboard ingestion (fixes #22392)

* refactor code with the required style

* add owner field in related json file

* feat: add topology-based owner config for database/schema/table

* Format the code by the pre-commit tools

* fix some errors

* add a doc to explain this feature

* translate all Chinese comments to English and consolidate documentation

* remove redundant code

* refactor code

* refactor code

* refactor code

* refactor code

* Add some tests for owner-config and enhance this feat

* Add some tests for owner-config and enhance this feat

* fix some error

* fix some error

* refactor code

* Remove the yaml and bash test files and test owner config with pytest style

* format the python code

* refactor ingestion code

* refactor code

* fix some error in test_owner_utils

---------

Co-authored-by: Ma,Yutao <yutao.ma@sap.com>
2025-10-23 07:24:45 +02:00
IceS2
633152124a
Fixes #23397: Thin Slice for Dimensionality on Data Quality (#23529)
* Initial implementation for Dimensionality on Data Quality Tests

* Fix ColumnValuesToBeUnique and create TestCaseResult API

* Refactor dimension result

* Initial E2E Implementation without Impact Score

* Dimensionality Thin Slice

* Update generated TypeScript types

* Update generated TypeScript types

* Removed useless method to use the one we already had

* Fix Pandas Dimensionality checks

* Remove useless comments

* Implement PR comments, fix Tests

* Improve the code a bit

* Fix imports

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-22 15:40:20 +02:00
Pere Miquel Brull
11e1c99180
MINOR - Looker support custom git host & local repo usage (#23973)
* MINOR - Looker support custom git host & local repo usage

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-22 06:05:55 +02:00
Mayur Singal
835a81337b
MINOR: Databricks pipeline support function parsing (#23959)
* MINOR: Databricks pipeline support function parsing

* pyformat
2025-10-21 15:27:31 +02:00
Teddy
e103a8c805
MINOR: Fix uppercase DBT to lowercase dbt (#23900)
* fix: uppercase DBT to lowercase dbt

* fix: change DBT to lowercase dbt in TestPlatform enum

* fix: fix dbt syntax in valueMax

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-10-21 07:59:09 +02:00
Eugenio
ae1b3ce953
[DQaC] Simplified API (#23850)
* Extend `metadata.sdk.configure` function

* Create convenience classes for existing `TestDefinition`s

* Create `WorkflowConfigBuilder` for data quality

* Create `ResultCapturingProcessor` for data quality

This is so we can intercept results from `TestCaseRunner` and return results to the calling application

* Implement `TestRunner` interface to run test cases as code

* Add an example of the simplified API

Also, fix some static checks errors in `builder_end_to_end.py`
2025-10-20 12:12:57 +00:00
Keshav Mohta
7ea87e7ca2
fix: table column description (#23928) 2025-10-20 09:59:23 +05:30
Keshav Mohta
e49d3ee31a
Fixes:: protobuf version (#23878)
* fix: upgraded opentelemetry-exporter-otlp & google-cloud-secret-manager for protobuf

* deps: upgrade pandas, numpy, opentelemetry-exporter-otlp, & asammdf

* fix: revert numpy and asammdf versions

* deps: downgrade pandas to 2.0.3
2025-10-20 09:55:15 +05:30
Keshav Mohta
1afe32f0c1
deps: upgraded sqlalchemy-bigquery to 1.15.0 (#23909) 2025-10-20 09:52:45 +05:30
mmigdiso
64d468188e
Fixes 23881: Added native query lineage extraction for powerbi-databricks (#23882)
* Added native query lineage extraction for powerbi-databricks

* improved error handling and logging

* checkstyle fix

---------

Co-authored-by: m.migdisoglu <m.migdisoglu@criteo.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-10-16 15:21:55 +05:30
Suman Maharana
63b663d884
Improve Tableau logging (#23892)
* Improve Tableau logging

* Addressed comments
2025-10-16 09:52:05 +05:30
sonika-shah
303ee47d6f
Add assets API and deprecate inline assets field for Domain and Dataproduct (#23856)
* Add assets API and deprecate inline assets field for Domain and Dataproduct

* fix mvn test

* fix py test and add new tests

* fix py test

* fix py test

* fix timeout for workflow test

* address pr feedback

* Update generated TypeScript types

* minor- remove unused function

---------

Co-authored-by: Bhanu Agrawal <bhanuagrawal2018@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-16 05:23:05 +05:30
Mayur Singal
3c527ca83b
MINOR: Fix Databricks DLT Pipeline Lineage to Track Table (#23888)
* MINOR: Fix Databricks DLT Pipeline Lineage to Track Table

* fix tests

* add support for s3 pipeline lineage as well
2025-10-15 10:54:01 +02:00
Akash Verma
9b16119ab5
feat: Add Hex dashboard connector support (#23246)
* feat: Add Hex dashboard connector support

* files

* Added tests and UI image

* fix tests

---------

Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-15 11:05:42 +05:30
Mohit Tilala
09c851265e
[Redshift] Add better handling of incomplete redshift view definition (#23866)
* Add better handling of incomplete redshift view definition

* Match exact definitions in tests

* Correct isort on tests
2025-10-14 12:51:07 +05:30
Keshav Mohta
50dbe6fe44
fix: view_names issue when incremental enabled (#23858) 2025-10-13 19:21:07 +05:30
Mayur Singal
a638bdcfe0
MINOR: Fix databricks pipeline repeating tasks issue (#23851) 2025-10-13 00:41:05 +05:30
Copilot
c8722faf47
Fix Grafana connector validation error for integer format fields (#23202)
* Initial plan

* Fix Grafana connector format field validation issue

- Update GrafanaTarget.format field to accept both str and int types
- Add field_validator to convert integer format codes to string equivalents
- Add comprehensive tests for format field validation scenarios
- Add test fixture with integer format fields that reproduces the original issue
- Ensure backwards compatibility with existing string format values

This resolves the issue where Grafana dashboards with integer format fields
(e.g., format: 0 instead of format: "table") were causing validation errors
and being skipped during ingestion.

Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>

* fix: GrafanaTarget model format type from str to Any

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Keshav Mohta <keshavmohta09@gmail.com>
2025-10-12 23:14:16 +05:30
harshsoni2024
c32a9b957f
Add AWS kinesis firehose connector [OSS] (#23807)
* AWS Firehose

* Add AWS Firehose

* add kinesis fireshose support

* remove unnecessary doc

* Update generated TypeScript types

* add connection doc, optional msg service name

* Update generated TypeScript types

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2025-10-12 08:27:13 -07:00
Ayush Shah
d71a47db1d
fix(kafkaconnect): update table search method to use search_in_any_service (#23852) 2025-10-12 20:02:12 +05:30
Sriharsha Chintalapani
ce3a9bd654
Kafka connect improvements (#23845)
* Kafka Connect Lineage Improvements

* Remove specific Kafka topic example from docstring

Removed example from the documentation regarding the earnin.bank.dev topic.

* fix: update comment to reflect accurate example for database server name handling

* fix: improve expected FQN display in warning messages for missing Kafka topics

* fix: update table entity retrieval method in KafkaconnectSource

* fix: enhance lineage information checks and improve logging for missing configurations in KafkaconnectSource

* Kafka Connect Lineage Improvements

* address comments; work without the table.include.list

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2025-10-11 22:26:14 +02:00
Sriharsha Chintalapani
5c638f5c8e
Databricks DLT pipelines parsing (#23848) 2025-10-11 22:25:43 +02:00
Ayush Shah
a90cacc93b
MINOR: fix Kafka connect CDC lineage (#23836) 2025-10-11 15:40:03 +05:30
Teddy
1f8cf64dd4
chore: added python 3.12 to CI (#23835)
* chore: added python 3.12 to CI

* chore: changed py-test-skip to 3.12
2025-10-10 17:26:45 +02:00
Teddy
93e5ee8cb1
fix: url encode fqn when retrieving test case results in python sdk (#23834) 2025-10-10 17:25:33 +02:00
Sriharsha Chintalapani
76020bd0e7
Fix Kafka Connect for lineage parsing (#23819)
* Fix Kafka Connect for lineage parsing

* Fix Kafka Connect for lineage parsing
2025-10-09 14:01:36 -07:00
Mayur Singal
88115e1218
MINOR: Fix training / issue in UC S3 lineage (#23816) 2025-10-09 18:44:07 +02:00
Antoine Balliet
be3a91f7df
fix: logger level should work for deprecation warnings (#23784)
* chore: implement logger levels tests for depreciation

* fix: use METADATA_LOGGER instead of warnings

* use unit test syntax

* isort

* black

* fix test

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-10-09 18:21:28 +02:00
Mayur Singal
05f064787f
Feat: Add kafka lineage support in databricks pipelines (#23813)
* Add dlt pipeline support

* Fix code style

* Add variable parsing

* Fix kafka lineage

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
2025-10-09 16:42:08 +02:00
Sriharsha Chintalapani
454d7367b0
Kafka Connect: Support Confluent Cloud connectors (#23780) 2025-10-09 01:28:27 +05:30
Mohit Tilala
da8c50d2a0
Add pagination for snowflake usage and lineage queries sql (#23781)
* Add pagination for snowflake usage and lineage queries sql

* py_format
2025-10-08 20:45:14 +05:30
Mayur Singal
4708c2b64f
feat: Unity Catalog Lineage Enhancement: External Location Support (#23790) 2025-10-08 20:26:39 +05:30
harshsoni2024
f2819ce4e4
Fix: PowerBI snowflake query lineage parsing (#23746) 2025-10-08 18:32:25 +05:30
Mohit Tilala
61e4c1ffba
Pin pydantic to <2.12.0 (#23782)
* Bump datamodel-code-generator to 0.34.0

* Pin down pydantic to <2.12

* Revert "Bump datamodel-code-generator to 0.34.0"

This reverts commit c69116d2935eea49e9c78b2607f2fea94bc44738.
2025-10-08 13:24:27 +05:30
Eugenio
af0672e4cf
Fixes #22302: add table2.keyColumns parameter for table diff validation (#23667)
* Update `TableDiffParamsSetter` to move data at table level

This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects

* Update `TableDiffValidator` to use table's `key_columns`

Call `data_diff` and run validations using each table's `key_columns`

* Create migration to update `tableDiff` test definition

* Fix Playwright test
2025-10-08 09:32:00 +02:00
Eugenio
a6ac42371d
Ensure recognizers are created (#23645)
* Add the migration classes and data for recognizers

This is so that we can run a migration that sets `json->recognizers` of `PII.Sensitive` and `PII.NonSensitive` tags from json values.

The issue with normal migrations was that the value of recognizers was too long to be persisted in the server migrations log.

Created a common `migration.utils.v1110.MigrationProcessBase`

* Ensure building automatically with the right parameters

* Update typescript types
2025-10-07 15:13:35 +00:00
Eugenio
47e953f9d3
PLAYWRIGHT FIXES: ensure sample data is passed to the right columns (#23761)
* Ensure we take columns ordered from the sampler

This is to avoid analyzing columns with data from other columns

* Remove expectation of address to have Sensitive tag

This is for a couple of reasons:
- First: per our internal definition it should actually be Non Sensitive.
- Second: presidio actually picks SOME of them up as PERSON (Sensitive) entities, but since we've raised the tolerance, now we're not classifying them as Sensitive.
2025-10-07 09:39:24 +02:00
harshsoni2024
9ba65ac0d2
Fix: Add support for datamodel source url (#23715) 2025-10-06 20:04:43 +00:00
Mohit Tilala
0cf0394d0b
Fixes #22406: Add workflow resource utilisation metrics for better troubleshooting (#23696)
* Add workflow resource utilization metrics for better troubleshooting

* Add types for correct static type checking

* Remove duplicate type annotations
2025-10-06 13:20:06 +05:30
harshsoni2024
da7a2778f6
MINOR: iceberg load table retry backoff (#23579) 2025-10-05 23:42:56 +05:30
Sriharsha Chintalapani
fc7412f6dd
Add Timescale Connector (#23665)
* Add Timescale Connector

* Update generated TypeScript types

* Add UI changes for the Timescale

* lineage, usage and java

* Add beta tag

* update logo

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-03 19:00:59 -07:00