1267 Commits

Author SHA1 Message Date
Sriharsha Chintalapani
ce3a9bd654
Kafka connect improvements (#23845)
* Kafka Connect Lineage Improvements

* Remove specific Kafka topic example from docstring

Removed example from the documentation regarding the earnin.bank.dev topic.

* fix: update comment to reflect accurate example for database server name handling

* fix: improve expected FQN display in warning messages for missing Kafka topics

* fix: update table entity retrieval method in KafkaconnectSource

* fix: enhance lineage information checks and improve logging for missing configurations in KafkaconnectSource

* Kafka Connect Lineage Improvements

* address comments; work without the table.include.list

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2025-10-11 22:26:14 +02:00
Sriharsha Chintalapani
5c638f5c8e
Databricks DLT pipelines parsing (#23848) 2025-10-11 22:25:43 +02:00
Ayush Shah
a90cacc93b
MINOR: fix Kafka connect CDC lineage (#23836) 2025-10-11 15:40:03 +05:30
Teddy
1f8cf64dd4
chore: added python 3.12 to CI (#23835)
* chore: added python 3.12 to CI

* chore: changed py-test-skip to 3.12
2025-10-10 17:26:45 +02:00
Teddy
93e5ee8cb1
fix: url encode fqn when retrieving test case results in python sdk (#23834) 2025-10-10 17:25:33 +02:00
Mayur Singal
88115e1218
MINOR: Fix training / issue in UC S3 lineage (#23816) 2025-10-09 18:44:07 +02:00
Antoine Balliet
be3a91f7df
fix: logger level should work for deprecation warnings (#23784)
* chore: implement logger levels tests for depreciation

* fix: use METADATA_LOGGER instead of warnings

* use unit test syntax

* isort

* black

* fix test

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-10-09 18:21:28 +02:00
Mayur Singal
05f064787f
Feat: Add kafka lineage support in databricks pipelines (#23813)
* Add dlt pipeline support

* Fix code style

* Add variable parsing

* Fix kafka lineage

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
2025-10-09 16:42:08 +02:00
Sriharsha Chintalapani
454d7367b0
Kafka Connect: Support Confluent Cloud connectors (#23780) 2025-10-09 01:28:27 +05:30
Mayur Singal
4708c2b64f
feat: Unity Catalog Lineage Enhancement: External Location Support (#23790) 2025-10-08 20:26:39 +05:30
harshsoni2024
f2819ce4e4
Fix: PowerBI snowflake query lineage parsing (#23746) 2025-10-08 18:32:25 +05:30
Eugenio
af0672e4cf
Fixes #22302: add table2.keyColumns parameter for table diff validation (#23667)
* Update `TableDiffParamsSetter` to move data at table level

This means that `key_columns` and `extra_columns` will be defined per table instead of "globally", just like `data_diff` expects

* Update `TableDiffValidator` to use table's `key_columns`

Call `data_diff` and run validations using each table's `key_columns`

* Create migration to update `tableDiff` test definition

* Fix Playwright test
2025-10-08 09:32:00 +02:00
harshsoni2024
da7a2778f6
MINOR: iceberg load table retry backoff (#23579) 2025-10-05 23:42:56 +05:30
Sriharsha Chintalapani
fc7412f6dd
Add Timescale Connector (#23665)
* Add Timescale Connector

* Update generated TypeScript types

* Add UI changes for the Timescale

* lineage, usage and java

* Add beta tag

* update logo

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-03 19:00:59 -07:00
Keshav Mohta
3d49b6689d
Fixes #23356: Databricks & UnityCatalog OAuth and Azure AD Auth (#23561)
* feat: databricks oauth and azure ad auth setup

* refactor: add auth type changes in databricks.md

* fix: test after oauth changes

* refactor: unity catalog connection to databricks connection code

* feat: added oauth and azure ad for unity catalog

* fix: unitycatalog tests, doc & required type in connection.json

* fix: generated tx files

* fix: exporter databricksConnection file

* refactor: unitycatalog example file

* fix: usage example files

* fix: unity catalog sqlalchemy connection

* fix: unity catalog client headers

* refactor: make common auth.py for dbx and unitycatalog

* fix: auth functions import

* fix: test unity catalog tags as None

* fix: type hinting and sql migration

* fix: migration for postgres
2025-10-03 19:53:19 +05:30
harshsoni2024
ea54b6b883
MINOR: datalake column subfields fix (#23576) 2025-10-03 16:13:10 +05:30
Akash Verma
06453a925d
Fix #21093 : Update test connection improvements (#23516)
* Update test connection improvements

* Update queries

* checkstyle

* fix test failure

---------

Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
2025-10-03 13:50:46 +05:30
Suman Maharana
c8055576ba
Fixes #21686 : Add missing includeOwners check in dashboard services (#22514) 2025-10-03 10:53:25 +05:30
Keshav Mohta
48ff77c917
Fixes: MF4 Import Error (#23659)
* fix: asammdf and avro import error

* fix: mf4 import only

* test: fix mf4 test
2025-10-01 20:08:45 +05:30
Eugenio
5da2d32b34
Use recognizer in classification (#23628)
* Refactor presidio utils

Extract the spacy model functionality from the analyzer building function

* Added a new `TagClassifier`

This classifier uses tags to dynamically build presidio `RecognizerRegistry`s

* Added a new `TagProcessor`

This processor uses `TagClassifier` to label a column based on the tags' recognizers

* Create `TagProcessor` based on workflow configuration

* Create decorator to apply threshold to recognizers

This is so that we can apply thresholds on recognizer results without subclassing or having to keep a map between the presidio recognizer and the recognizer configuration

* Fix broken test
2025-10-01 14:43:28 +02:00
Eugenio
dff2b394d5
Fix classification scoring (#23523)
* Add `reason` property to `TagLabel`

This is to understand what score was used for selecting the entity

* Build `TagLabel`s with `reason`

* Increase `PIIProcessor._tolerance`

This is so we correctly filter out low scores from classifiers while still maintaining the normalization that filters out confusing outcomes.

e.g: an output with scores 0.3, 0.7 and 0.75, would initially filter the 0.3 and then discard the other two because they're both relatively high results.

* Make database and DAO changes needed to persist `TagLabel.reason`

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-01 12:11:14 +00:00
Keshav Mohta
6b7262a8ea
Feature: MF4 File Reader (#23308)
* feat: mf4 file reader

* refactor: removed schema_from_data implementation

* test: added tests for mf4 files
2025-10-01 11:19:00 +02:00
Ayush Shah
dd99ab5678
feat: Add Unity Catalog data diff module to use DBX connection instead of workspaceclient (#23404) 2025-09-30 20:56:54 +05:30
Sriharsha Chintalapani
18677afd39
Add support for Tags customizable rules, capturing feedback (#23289)
* Add support for translations in multi lang

* Add Tag Feedback System

* Update generated TypeScript types

* Fix typing issues and add tests to reocgnizer factory

* Updated `TagResourceTest.assertFieldChange` to fix broken test

This is because change description values had been serialized into strings and for some reason the keys ended up in a different order. So instead of performing String comparison, we do Json comparisons

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eugenio Doñaque <eugenio.donaque@getcollate.io>
2025-09-30 07:17:18 +02:00
Sriharsha Chintalapani
bb1395fc72
Implement Modern Fluent API Pattern for OpenMetadata Java Client (#23239)
* Implement Modern Fluent API Pattern for OpenMetadata Java Client

* Add Lineage, Bulk, Search static methods

* Add all API support for Java & Python SDKs

* Add Python SDKs and mock tests

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks, support async import/export

* Remove unnecessary scripts

* fix py checkstyle

* fix tests with new plural form sdks

* Fix tests

* remove examples from python sdk

* remove examples from python sdk

* Fix type check

* Fix pyformat check

* Fix pyformat check

* fix python integration tests

* fix pycheck and pytests

* fix search api pycheck

* fix pycheck

* fix pycheck

* fix pycheck

* Fix test_sdk_integration

* Improvements to SDK

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9
2025-09-29 16:07:02 -07:00
Mohit Tilala
22a0925cd2
Fix correct snowflake object types in source url (#23612) 2025-09-29 15:31:10 +00:00
Eugenio
bb50514a00
FIxes #16983: can't sample data from trino tables with complex types (#23478)
* Update test data for `tests.integration.trino`

This is to create tables with complex data types.

Using raw SQL because creating tables with pandas didn't get the right types for the structs

* Update tests to reproduce the issue

Also included the new tables in the other tests to make sure complex data types do not break anything else

Reference: [issue 16983](https://github.com/open-metadata/OpenMetadata/issues/16983)

* Added `TypeDecorator`s handle `trino.types.NamedRowTuple`

This is because pydantic couldn't figure out how to create python objects when receiving `NamedRowTuple`s, which broke the sampling process.

This makes sure the data we receive from the trino interface is compatible with Pydantic
2025-09-26 08:13:28 +02:00
Keshav Mohta
cb26c91442
Revert "Fixes #23356: Databricks OAuth & Azure AD Auth (#23482)" (#23530)
This reverts commit f1afe8f5f114ee58090168fd7ae5d66b38a01ab0.
2025-09-23 17:44:16 +02:00
Teddy
57c5a50d20
ISSUE #23435 - Fix pass / fail count for custom SQL (#23506)
* fix: added logic to compute pass/fail for sql queries with cte, nested queries, and joins

* added logic to correctly compute pass / fail rows

* style: ran python linting

* fix: failing tests

* style: fix linting error

* fix: flawed count logic

* fix: handle case where we don't compute row count
2025-09-23 16:53:51 +02:00
Ayush Shah
d94b39f6f5
fix(ssl): Update SSLManager to use dynamic schema registry paths (#23505) 2025-09-23 18:10:18 +05:30
Keshav Mohta
f1afe8f5f1
Fixes #23356: Databricks OAuth & Azure AD Auth (#23482)
* feat: databricks oauth and azure ad auth setup

* refactor: add auth type changes in databricks.md

* fix: test after oauth changes

* refactor: unity catalog connection to databricks connection code
2025-09-23 15:22:50 +05:30
Keshav Mohta
9262040381
fix: handle database native types for create table request during openlineage lineage (#23513) 2025-09-23 10:11:39 +02:00
Suman Maharana
e2b903532e
Fixes - Kafkaconnect lineage & descriptions (#23234)
* Fix Kafkaconnect lineage & descriptions

* fix typos

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comments

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comms

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-23 10:08:37 +02:00
Mohit Tilala
d1e60acd2a
[SAP HANA] Prevent exponential processing lineage parsing and use full name for filtering (#23484)
* Prevent exponential processing lineage parsing

* Use full name of views for filtering

* pylint fix - isort
2025-09-22 19:46:34 +05:30
Keshav Mohta
1a67e4fb7d
Feature: MariaDB Stored Procedures and Functions Support #23422 2025-09-18 17:59:39 +05:30
Akash Verma
da5dab7fef
Fixes #23388: Handle string and dict types for Metabase dataset_query field (#23417)
* Handle string and dict types for Metabase dataset_query field

* Added tests

---------

Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-09-16 16:57:08 -07:00
Sriharsha Chintalapani
cf7931ee3b
Add logging endpoint into S3 (#22533)
* Add logging endpoint into S3

* Update generated TypeScript types

* Stream Ingestion logs to S3

* Update generated TypeScript types

* Address comments

* Update generated TypeScript types

* create logs mixin, use clients to stream logs

* centralize logs sending into mixin

* use StreamableLogHandlerManager instead global handler

* improve condition

* remove example workflow file

* formatting changes

* fix tests and format

* tests, checkstyle fix

* minor changes

* reformat code

* tests fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2025-09-15 07:22:25 -07:00
Suman Maharana
39cb165164
Feat: show dbt project name (#23044)
* Feat: show dbt project name

* Update generated TypeScript types

* added dbtSourceProject in data asset header properties

* Added tests

* Addressed comments

* Update generated TypeScript types

* move from dataAssetHeader to the dbt tab itself

* added unit test for added code

* test name change

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashish Gupta <ashish@getcollate.io>
2025-09-10 11:23:28 +02:00
Suman Maharana
000aaa63f1
Fix tableau e2e count error (#23287) 2025-09-10 01:52:34 +05:30
IceS2
8177e529bc
FIXES #23220: Add cardinality metric for string and enum (#23052)
* Implement Cardinality Metric for String and Enum

* Add Unit Tests

* Update generated TypeScript types

* Update ingestion/src/metadata/profiler/metrics/hybrid/cardinality_distribution.py

Co-authored-by: Teddy <teddy.crepineau@gmail.com>

* Fix CTE to simplify it to work with sqlite

* Fix CTE to simplify it to work with sqlite

* Update generated TypeScript types

* Update generated TypeScript types

* Add 'cardinalityDistribution' metric to profiler configuration

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-09-09 16:38:53 +02:00
Teddy
1ef191a2aa
ISSUE #1534 - Profiler Refactor for Metadata Extraction Application (#23200)
* feat: added exporter app config

* refactor: added entityprofile resource & added backward compatibility to existing API

* feat: added tests to get_profile_data_by_type

* feat: remove non supported event types

* chore: added migrations to 1.9.7

* chore: added application creation readme

* chore: move migrations to 1.9.8

* fix: failing java test

* style: ran java linting
2025-09-05 13:07:04 +02:00
Keshav Mohta
103857f90c
Fixes #23010 #: BigQuery Project Selection In Profiler & AutoClassification Workflow (#23233)
* fix: added code for separate engine and session for each project in rofiler and classification and refactor billing project approach

* fix: added entity.database check, bigquery sampling tests

* fix: system metrics logic when bigquery billing project is provided
2025-09-05 14:09:14 +05:30
Mohit Tilala
9b2b4d2452
[Lineage] Fix cross services lineage changes of service_names to missed methods (#23240)
* Fix cross db changes of service_names to missed methods

* Handle string value passed to service_names
2025-09-04 20:38:05 +05:30
Mohit Tilala
f2fd8a9107
Fixes #22452: [Snowflake] Add custom host support for View in Snowflake source url (#23209) 2025-09-03 14:13:03 +05:30
Ram Narayan Balaji
5cb33ce78a
Implementation of Adding Entity Status and Reviewers to assets (#22904)
* Initial Implementation of Adding Status and Reviewers to assets for workflows

* Update generated TypeScript types

* Copilot Review Comments Addressed

* Removed DataProduct Reviewer Inheritance as it is irrelevant

* Commit: Classification has status and reviewers, DataContract uses the same status enums, changed the logic to be APPROVED instead of Active, DataContract can have null status as seen in tests, Changed Workflow to use workflowStatus instead of status as it is contradicting with the approval status, Fixed Tests

* Default for reviewers is null

* Default for reviewers is createSchema

* Addressed CoPilots comments

* Update generated TypeScript types

* Workflow status to workflowStatus in db and migrations

* Revert "Workflow status to workflowStatus in db and migrations"

This reverts commit 676e8789358654bc6f980f855c372f33c22fc40b.

* Changed status to entityStatus in the schema files

* Java Implementation of Default Status, Search Client improvements and Test fixes and new tests

* Adding entityStatus and reviewers in the searchIndex mappings and common attributes

* Data Migration scripts to change the glossaryTerm and dataContract structure

* Update generated TypeScript types

* Fixed zh/spreadsheet index json error

* Fix Postgre migration script

* Changed the entityStatus.json to status.json
Removed the duplicates of entityStatus in the indexMapping
Modified the sample data to take in EntityStatus.Approved instead of ContractStatus.Active

* Update generated TypeScript types

* dummy commit

* Fix UI Build Issues with the New EntityStatus
Fix py tests

* Migrations for all the entities that need entityStatus

* Update generated TypeScript types

* Removed Post Migration scripts

* Fix UI  and py for entityStatus

* Update generated TypeScript types

* Fix: DataContractResourceTest

* Fix UI and py for importing entityStatus

* UI to show and fetch Reviewers

* cleanup

* Removed Overridden SetDefaultStatus in GlossaryTermRepository

* Removed unnecessary validation

* Added entityStatus in search_entity_index_mapping.json

* Fixed DataContractResourceTest

* mvn spotless apply and fix migration scripts

* fix tests

* fix type error

* fix advanced search tests

* Status comparison using enums and supportsStatus to supportsEntityStatus

* mvn spotless apply

* fix merge conflict

* update entity status

* fix tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Karan Hotchandani <33024356+karanh37@users.noreply.github.com>
Co-authored-by: karanh37 <karanh37@gmail.com>
2025-09-03 12:49:45 +05:30
Mohit Tilala
04a3639e47
Fixes #21895 #22363 #22369: Lineage improvements with multiprocessing, stored procedure level temp table processing and lineage filtering with db & schema (#22371)
* MINOR: Improve UDF Lineage Processing & Better Logging Time & MultiProcessing (#20848)

* Fix multiprocessing with better memory management and Airflow 2+ compatibility

* Add support for both multiprocessing and multithreading for relevant platforms

* Handle conflicting cross-db lineage changes of service_name parameter change

* Handle stored proc queries without caching all and increase the thread timeout times to cover 100% lineage

* Fix `get_table_query` inheritance and pylint

* Remove  mocks from db_utils tests

* Better db_utils test and fix the service_names parameter in case of schema_fallback

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-09-03 11:26:14 +05:30
Suman Maharana
20e18d4f9f
Add ssl support to hive (#22831)
* Add ssl support to hive

* Added missing ts files

* Added version to pure transport

* Added Tests

* fix tests add missing files
2025-09-02 20:13:30 +05:30
Mayur Singal
08ee62a198
MINOR: Add Unstructured Formats Support to GCS Storage Connector (#23158) 2025-09-02 18:22:39 +05:30
Suman Maharana
30bceee580
Fixes #22204 - Add support for sources key metadata fetch in dbt (#23003)
* Added support for sources key metadata fetch in dbt

* address comments

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fixes

* fixed tests

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-02 10:22:15 +05:30
Pere Miquel Brull
abcdc4e3d6
MINOR - Domain Independent DP Rule (#23067)
* MINOR - Domain Independent DP Rule

* handle DP

* Handle DP

* add migration

* improve rule mgmt

* improve rule mgmt

* add test for bulk op

* fix test

* handle in bulk

---------

Co-authored-by: sonika-shah <58761340+sonika-shah@users.noreply.github.com>
2025-08-29 17:28:29 +02:00