3586 Commits

Author SHA1 Message Date
Suman Maharana
63b663d884
Improve Tableau logging (#23892)
* Improve Tableau logging

* Addressed comments
2025-10-16 09:52:05 +05:30
sonika-shah
303ee47d6f
Add assets API and deprecate inline assets field for Domain and Dataproduct (#23856)
* Add assets API and deprecate inline assets field for Domain and Dataproduct

* fix mvn test

* fix py test and add new tests

* fix py test

* fix py test

* fix timeout for workflow test

* address pr feedback

* Update generated TypeScript types

* minor- remove unused function

---------

Co-authored-by: Bhanu Agrawal <bhanuagrawal2018@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-16 05:23:05 +05:30
Mayur Singal
3c527ca83b
MINOR: Fix Databricks DLT Pipeline Lineage to Track Table (#23888)
* MINOR: Fix Databricks DLT Pipeline Lineage to Track Table

* fix tests

* add support for s3 pipeline lineage as well
2025-10-15 10:54:01 +02:00
Akash Verma
9b16119ab5
feat: Add Hex dashboard connector support (#23246)
* feat: Add Hex dashboard connector support

* files

* Added tests and UI image

* fix tests

---------

Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-15 11:05:42 +05:30
Mohit Tilala
09c851265e
[Redshift] Add better handling of incomplete redshift view definition (#23866)
* Add better handling of incomplete redshift view definition

* Match exact definitions in tests

* Correct isort on tests
2025-10-14 12:51:07 +05:30
Keshav Mohta
50dbe6fe44
fix: view_names issue when incremental enabled (#23858) 2025-10-13 19:21:07 +05:30
Mayur Singal
a638bdcfe0
MINOR: Fix databricks pipeline repeating tasks issue (#23851) 2025-10-13 00:41:05 +05:30
Copilot
c8722faf47
Fix Grafana connector validation error for integer format fields (#23202)
* Initial plan

* Fix Grafana connector format field validation issue

- Update GrafanaTarget.format field to accept both str and int types
- Add field_validator to convert integer format codes to string equivalents
- Add comprehensive tests for format field validation scenarios
- Add test fixture with integer format fields that reproduces the original issue
- Ensure backwards compatibility with existing string format values

This resolves the issue where Grafana dashboards with integer format fields
(e.g., format: 0 instead of format: "table") were causing validation errors
and being skipped during ingestion.

Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>

* fix: GrafanaTarget model format type from str to Any

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ulixius9 <39544459+ulixius9@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Keshav Mohta <keshavmohta09@gmail.com>
2025-10-12 23:14:16 +05:30
harshsoni2024
c32a9b957f
Add AWS kinesis firehose connector [OSS] (#23807)
* AWS Firehose

* Add AWS Firehose

* add kinesis fireshose support

* remove unnecessary doc

* Update generated TypeScript types

* add connection doc, optional msg service name

* Update generated TypeScript types

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2025-10-12 08:27:13 -07:00
Ayush Shah
d71a47db1d
fix(kafkaconnect): update table search method to use search_in_any_service (#23852) 2025-10-12 20:02:12 +05:30
Sriharsha Chintalapani
ce3a9bd654
Kafka connect improvements (#23845)
* Kafka Connect Lineage Improvements

* Remove specific Kafka topic example from docstring

Removed example from the documentation regarding the earnin.bank.dev topic.

* fix: update comment to reflect accurate example for database server name handling

* fix: improve expected FQN display in warning messages for missing Kafka topics

* fix: update table entity retrieval method in KafkaconnectSource

* fix: enhance lineage information checks and improve logging for missing configurations in KafkaconnectSource

* Kafka Connect Lineage Improvements

* address comments; work without the table.include.list

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2025-10-11 22:26:14 +02:00
Sriharsha Chintalapani
5c638f5c8e
Databricks DLT pipelines parsing (#23848) 2025-10-11 22:25:43 +02:00
Ayush Shah
a90cacc93b
MINOR: fix Kafka connect CDC lineage (#23836) 2025-10-11 15:40:03 +05:30
Teddy
1f8cf64dd4
chore: added python 3.12 to CI (#23835)
* chore: added python 3.12 to CI

* chore: changed py-test-skip to 3.12
2025-10-10 17:26:45 +02:00
Teddy
93e5ee8cb1
fix: url encode fqn when retrieving test case results in python sdk (#23834) 2025-10-10 17:25:33 +02:00
Sriharsha Chintalapani
76020bd0e7
Fix Kafka Connect for lineage parsing (#23819)
* Fix Kafka Connect for lineage parsing

* Fix Kafka Connect for lineage parsing
2025-10-09 14:01:36 -07:00
Mayur Singal
88115e1218
MINOR: Fix training / issue in UC S3 lineage (#23816) 2025-10-09 18:44:07 +02:00
Antoine Balliet
be3a91f7df
fix: logger level should work for deprecation warnings (#23784)
* chore: implement logger levels tests for depreciation

* fix: use METADATA_LOGGER instead of warnings

* use unit test syntax

* isort

* black

* fix test

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-10-09 18:21:28 +02:00
Mayur Singal
05f064787f
Feat: Add kafka lineage support in databricks pipelines (#23813)
* Add dlt pipeline support

* Fix code style

* Add variable parsing

* Fix kafka lineage

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
2025-10-09 16:42:08 +02:00
Sriharsha Chintalapani
454d7367b0
Kafka Connect: Support Confluent Cloud connectors (#23780) 2025-10-09 01:28:27 +05:30
Mohit Tilala
da8c50d2a0
Add pagination for snowflake usage and lineage queries sql (#23781)
* Add pagination for snowflake usage and lineage queries sql

* py_format
2025-10-08 20:45:14 +05:30
Mayur Singal
4708c2b64f
feat: Unity Catalog Lineage Enhancement: External Location Support (#23790) 2025-10-08 20:26:39 +05:30
harshsoni2024
f2819ce4e4
Fix: PowerBI snowflake query lineage parsing (#23746) 2025-10-08 18:32:25 +05:30
Mohit Tilala
61e4c1ffba
Pin pydantic to <2.12.0 (#23782)
* Bump datamodel-code-generator to 0.34.0

* Pin down pydantic to <2.12

* Revert "Bump datamodel-code-generator to 0.34.0"

This reverts commit c69116d2935eea49e9c78b2607f2fea94bc44738.
2025-10-08 13:24:27 +05:30
Eugenio
af0672e4cf
Fixes #22302: add table2.keyColumns parameter for table diff validation (#23667)
* Update `TableDiffParamsSetter` to move data at table level

This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects

* Update `TableDiffValidator` to use table's `key_columns`

Call `data_diff` and run validations using each table's `key_columns`

* Create migration to update `tableDiff` test definition

* Fix Playwright test
2025-10-08 09:32:00 +02:00
Eugenio
a6ac42371d
Ensure recognizers are created (#23645)
* Add the migration classes and data for recognizers

This is so that we can run a migration that sets `json->recognizers` of `PII.Sensitive` and `PII.NonSensitive` tags from json values.

The issue with normal migrations was that the value of recognizers was too long to be persisted in the server migrations log.

Created a common `migration.utils.v1110.MigrationProcessBase`

* Ensure building automatically with the right parameters

* Update typescript types
2025-10-07 15:13:35 +00:00
Eugenio
47e953f9d3
PLAYWRIGHT FIXES: ensure sample data is passed to the right columns (#23761)
* Ensure we take columns ordered from the sampler

This is to avoid analyzing columns with data from other columns

* Remove expectation of address to have Sensitive tag

This is for a couple of reasons:
- First: per our internal definition it should actually be Non Sensitive.
- Second: presidio actually picks SOME of them up as PERSON (Sensitive) entities, but since we've raised the tolerance, now we're not classifying them as Sensitive.
2025-10-07 09:39:24 +02:00
harshsoni2024
9ba65ac0d2
Fix: Add support for datamodel source url (#23715) 2025-10-06 20:04:43 +00:00
Mohit Tilala
0cf0394d0b
Fixes #22406: Add workflow resource utilisation metrics for better troubleshooting (#23696)
* Add workflow resource utilization metrics for better troubleshooting

* Add types for correct static type checking

* Remove duplicate type annotations
2025-10-06 13:20:06 +05:30
harshsoni2024
da7a2778f6
MINOR: iceberg load table retry backoff (#23579) 2025-10-05 23:42:56 +05:30
Sriharsha Chintalapani
fc7412f6dd
Add Timescale Connector (#23665)
* Add Timescale Connector

* Update generated TypeScript types

* Add UI changes for the Timescale

* lineage, usage and java

* Add beta tag

* update logo

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-03 19:00:59 -07:00
Mohit Tilala
b15dc8fe42
Add better handling of no columns found/permission issue exceptions (#23695) 2025-10-03 21:07:16 +05:30
Keshav Mohta
3d49b6689d
Fixes #23356: Databricks & UnityCatalog OAuth and Azure AD Auth (#23561)
* feat: databricks oauth and azure ad auth setup

* refactor: add auth type changes in databricks.md

* fix: test after oauth changes

* refactor: unity catalog connection to databricks connection code

* feat: added oauth and azure ad for unity catalog

* fix: unitycatalog tests, doc & required type in connection.json

* fix: generated tx files

* fix: exporter databricksConnection file

* refactor: unitycatalog example file

* fix: usage example files

* fix: unity catalog sqlalchemy connection

* fix: unity catalog client headers

* refactor: make common auth.py for dbx and unitycatalog

* fix: auth functions import

* fix: test unity catalog tags as None

* fix: type hinting and sql migration

* fix: migration for postgres
2025-10-03 19:53:19 +05:30
harshsoni2024
ea54b6b883
MINOR: datalake column subfields fix (#23576) 2025-10-03 16:13:10 +05:30
Akash Verma
06453a925d
Fix #21093 : Update test connection improvements (#23516)
* Update test connection improvements

* Update queries

* checkstyle

* fix test failure

---------

Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
2025-10-03 13:50:46 +05:30
Akash Verma
5bb2924a6a
Fix #16081 : Add support for SQL Server hierarchyid, geography, and geometry types (#23527) 2025-10-03 11:46:01 +05:30
Akash Verma
4d68fe7a10
feat: Add ML model lineage support (#23494) 2025-10-03 11:38:41 +05:30
Suman Maharana
c8055576ba
Fixes #21686 : Add missing includeOwners check in dashboard services (#22514) 2025-10-03 10:53:25 +05:30
Keshav Mohta
48ff77c917
Fixes: MF4 Import Error (#23659)
* fix: asammdf and avro import error

* fix: mf4 import only

* test: fix mf4 test
2025-10-01 20:08:45 +05:30
Eugenio
5da2d32b34
Use recognizer in classification (#23628)
* Refactor presidio utils

Extract the spacy model functionality from the analyzer building function

* Added a new `TagClassifier`

This classifier uses tags to dynamically build presidio `RecognizerRegistry`s

* Added a new `TagProcessor`

This processor uses `TagClassifier` to label a column based on the tags' recognizers

* Create `TagProcessor` based on workflow configuration

* Create decorator to apply threshold to recognizers

This is so that we can apply thresholds on recognizer results without subclassing or having to keep a map between the presidio recognizer and the recognizer configuration

* Fix broken test
2025-10-01 14:43:28 +02:00
Eugenio
dff2b394d5
Fix classification scoring (#23523)
* Add `reason` property to `TagLabel`

This is to understand what score was used for selecting the entity

* Build `TagLabel`s with `reason`

* Increase `PIIProcessor._tolerance`

This is so we correctly filter out low scores from classifiers while still maintaining the normalization that filters out confusing outcomes.

e.g: an output with scores 0.3, 0.7 and 0.75, would initially filter the 0.3 and then discard the other two because they're both relatively high results.

* Make database and DAO changes needed to persist `TagLabel.reason`

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-01 12:11:14 +00:00
Keshav Mohta
6b7262a8ea
Feature: MF4 File Reader (#23308)
* feat: mf4 file reader

* refactor: removed schema_from_data implementation

* test: added tests for mf4 files
2025-10-01 11:19:00 +02:00
Pere Miquel Brull
375e001dd9
MINOR - Fix S3 logging from ingestion pipelines (#23590)
* MINOR - Fix S3 logging from ingestion pipelines

* Update generated TypeScript types

* config

* update s3 configurations for streamable logs

* Update generated TypeScript types

* update s3 configurations for streamable logs

* update s3 configurations for streamable logs

* update s3 configurations for streamable logs

* SSE off by default

* Update log retrieval to use s3 if ingestion runner has streamable logs enabled

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pablo Takara <pjt1991@gmail.com>
2025-10-01 09:44:17 +02:00
Ayush Shah
dd99ab5678
feat: Add Unity Catalog data diff module to use DBX connection instead of workspaceclient (#23404) 2025-09-30 20:56:54 +05:30
Sriharsha Chintalapani
18677afd39
Add support for Tags customizable rules, capturing feedback (#23289)
* Add support for translations in multi lang

* Add Tag Feedback System

* Update generated TypeScript types

* Fix typing issues and add tests to reocgnizer factory

* Updated `TagResourceTest.assertFieldChange` to fix broken test

This is because change description values had been serialized into strings and for some reason the keys ended up in a different order. So instead of performing String comparison, we do Json comparisons

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eugenio Doñaque <eugenio.donaque@getcollate.io>
2025-09-30 07:17:18 +02:00
Sriharsha Chintalapani
bb1395fc72
Implement Modern Fluent API Pattern for OpenMetadata Java Client (#23239)
* Implement Modern Fluent API Pattern for OpenMetadata Java Client

* Add Lineage, Bulk, Search static methods

* Add all API support for Java & Python SDKs

* Add Python SDKs and mock tests

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks, support async import/export

* Remove unnecessary scripts

* fix py checkstyle

* fix tests with new plural form sdks

* Fix tests

* remove examples from python sdk

* remove examples from python sdk

* Fix type check

* Fix pyformat check

* Fix pyformat check

* fix python integration tests

* fix pycheck and pytests

* fix search api pycheck

* fix pycheck

* fix pycheck

* fix pycheck

* Fix test_sdk_integration

* Improvements to SDK

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9
2025-09-29 16:07:02 -07:00
Mohit Tilala
22a0925cd2
Fix correct snowflake object types in source url (#23612) 2025-09-29 15:31:10 +00:00
Keshav Mohta
4528c0c1c4
Fixes #23416: Option To Opt Out of BigQuery Policy Tags Ingestion (#23532)
* fix: added includePolicyTags flag

* feat: added includePolicyTags
2025-09-29 18:24:10 +05:30
Mayur Singal
b489112bdd
MINOR: Fix import error log (#23578) 2025-09-29 12:47:19 +05:30
Keshav Mohta
94104e0806
fix: lineage flow and improved logging for databricks pipeline (#23586) 2025-09-26 18:22:01 +05:30