2979 Commits

Author SHA1 Message Date
Sriharsha Chintalapani
fc7412f6dd
Add Timescale Connector (#23665)
* Add Timescale Connector

* Update generated TypeScript types

* Add UI changes for the Timescale

* lineage, usage and java

* Add beta tag

* update logo

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-10-03 19:00:59 -07:00
Mohit Tilala
b15dc8fe42
Add better handling of no columns found/permission issue exceptions (#23695) 2025-10-03 21:07:16 +05:30
Keshav Mohta
3d49b6689d
Fixes #23356: Databricks & UnityCatalog OAuth and Azure AD Auth (#23561)
* feat: databricks oauth and azure ad auth setup

* refactor: add auth type changes in databricks.md

* fix: test after oauth changes

* refactor: unity catalog connection to databricks connection code

* feat: added oauth and azure ad for unity catalog

* fix: unitycatalog tests, doc & required type in connection.json

* fix: generated tx files

* fix: exporter databricksConnection file

* refactor: unitycatalog example file

* fix: usage example files

* fix: unity catalog sqlalchemy connection

* fix: unity catalog client headers

* refactor: make common auth.py for dbx and unitycatalog

* fix: auth functions import

* fix: test unity catalog tags as None

* fix: type hinting and sql migration

* fix: migration for postgres
2025-10-03 19:53:19 +05:30
harshsoni2024
ea54b6b883
MINOR: datalake column subfields fix (#23576) 2025-10-03 16:13:10 +05:30
Akash Verma
06453a925d
Fix #21093 : Update test connection improvements (#23516)
* Update test connection improvements

* Update queries

* checkstyle

* fix test failure

---------

Co-authored-by: Akash Verma <akashverma@Akashs-MacBook-Pro-2.local>
2025-10-03 13:50:46 +05:30
Akash Verma
5bb2924a6a
Fix #16081 : Add support for SQL Server hierarchyid, geography, and geometry types (#23527) 2025-10-03 11:46:01 +05:30
Akash Verma
4d68fe7a10
feat: Add ML model lineage support (#23494) 2025-10-03 11:38:41 +05:30
Suman Maharana
c8055576ba
Fixes #21686 : Add missing includeOwners check in dashboard services (#22514) 2025-10-03 10:53:25 +05:30
Keshav Mohta
48ff77c917
Fixes: MF4 Import Error (#23659)
* fix: asammdf and avro import error

* fix: mf4 import only

* test: fix mf4 test
2025-10-01 20:08:45 +05:30
Eugenio
5da2d32b34
Use recognizer in classification (#23628)
* Refactor presidio utils

Extract the spacy model functionality from the analyzer building function

* Added a new `TagClassifier`

This classifier uses tags to dynamically build presidio `RecognizerRegistry`s

* Added a new `TagProcessor`

This processor uses `TagClassifier` to label a column based on the tags' recognizers

* Create `TagProcessor` based on workflow configuration

* Create decorator to apply threshold to recognizers

This is so that we can apply thresholds on recognizer results without subclassing or having to keep a map between the presidio recognizer and the recognizer configuration

* Fix broken test
2025-10-01 14:43:28 +02:00
Eugenio
dff2b394d5
Fix classification scoring (#23523)
* Add `reason` property to `TagLabel`

This is to understand what score was used for selecting the entity

* Build `TagLabel`s with `reason`

* Increase `PIIProcessor._tolerance`

This is so we correctly filter out low scores from classifiers while still maintaining the normalization that filters out confusing outcomes.

e.g: an output with scores 0.3, 0.7 and 0.75, would initially filter the 0.3 and then discard the other two because they're both relatively high results.

* Make database and DAO changes needed to persist `TagLabel.reason`

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-01 12:11:14 +00:00
Keshav Mohta
6b7262a8ea
Feature: MF4 File Reader (#23308)
* feat: mf4 file reader

* refactor: removed schema_from_data implementation

* test: added tests for mf4 files
2025-10-01 11:19:00 +02:00
Pere Miquel Brull
375e001dd9
MINOR - Fix S3 logging from ingestion pipelines (#23590)
* MINOR - Fix S3 logging from ingestion pipelines

* Update generated TypeScript types

* config

* update s3 configurations for streamable logs

* Update generated TypeScript types

* update s3 configurations for streamable logs

* update s3 configurations for streamable logs

* update s3 configurations for streamable logs

* SSE off by default

* Update log retrieval to use s3 if ingestion runner has streamable logs enabled

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Pablo Takara <pjt1991@gmail.com>
2025-10-01 09:44:17 +02:00
Ayush Shah
dd99ab5678
feat: Add Unity Catalog data diff module to use DBX connection instead of workspaceclient (#23404) 2025-09-30 20:56:54 +05:30
Sriharsha Chintalapani
18677afd39
Add support for Tags customizable rules, capturing feedback (#23289)
* Add support for translations in multi lang

* Add Tag Feedback System

* Update generated TypeScript types

* Fix typing issues and add tests to reocgnizer factory

* Updated `TagResourceTest.assertFieldChange` to fix broken test

This is because change description values had been serialized into strings and for some reason the keys ended up in a different order. So instead of performing String comparison, we do Json comparisons

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eugenio Doñaque <eugenio.donaque@getcollate.io>
2025-09-30 07:17:18 +02:00
Sriharsha Chintalapani
bb1395fc72
Implement Modern Fluent API Pattern for OpenMetadata Java Client (#23239)
* Implement Modern Fluent API Pattern for OpenMetadata Java Client

* Add Lineage, Bulk, Search static methods

* Add all API support for Java & Python SDKs

* Add Python SDKs and mock tests

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks

* Add Fluent APIs for sdks, support async import/export

* Remove unnecessary scripts

* fix py checkstyle

* fix tests with new plural form sdks

* Fix tests

* remove examples from python sdk

* remove examples from python sdk

* Fix type check

* Fix pyformat check

* Fix pyformat check

* fix python integration tests

* fix pycheck and pytests

* fix search api pycheck

* fix pycheck

* fix pycheck

* fix pycheck

* Fix test_sdk_integration

* Improvements to SDK

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9

* Remove SDK coverage for Python 3.9
2025-09-29 16:07:02 -07:00
Mohit Tilala
22a0925cd2
Fix correct snowflake object types in source url (#23612) 2025-09-29 15:31:10 +00:00
Keshav Mohta
4528c0c1c4
Fixes #23416: Option To Opt Out of BigQuery Policy Tags Ingestion (#23532)
* fix: added includePolicyTags flag

* feat: added includePolicyTags
2025-09-29 18:24:10 +05:30
Mayur Singal
b489112bdd
MINOR: Fix import error log (#23578) 2025-09-29 12:47:19 +05:30
Keshav Mohta
94104e0806
fix: lineage flow and improved logging for databricks pipeline (#23586) 2025-09-26 18:22:01 +05:30
Eugenio
bb50514a00
FIxes #16983: can't sample data from trino tables with complex types (#23478)
* Update test data for `tests.integration.trino`

This is to create tables with complex data types.

Using raw SQL because creating tables with pandas didn't get the right types for the structs

* Update tests to reproduce the issue

Also included the new tables in the other tests to make sure complex data types do not break anything else

Reference: [issue 16983](https://github.com/open-metadata/OpenMetadata/issues/16983)

* Added `TypeDecorator`s handle `trino.types.NamedRowTuple`

This is because pydantic couldn't figure out how to create python objects when receiving `NamedRowTuple`s, which broke the sampling process.

This makes sure the data we receive from the trino interface is compatible with Pydantic
2025-09-26 08:13:28 +02:00
Suman Maharana
be51d53464
Fix - Hive Metastore None issue (#23520) 2025-09-24 10:11:29 +05:30
Mayur Singal
933802a354
MINOR: Support metabase API Key Auth (#23436) 2025-09-23 22:01:10 +05:30
Keshav Mohta
cb26c91442
Revert "Fixes #23356: Databricks OAuth & Azure AD Auth (#23482)" (#23530)
This reverts commit f1afe8f5f114ee58090168fd7ae5d66b38a01ab0.
2025-09-23 17:44:16 +02:00
Teddy
57c5a50d20
ISSUE #23435 - Fix pass / fail count for custom SQL (#23506)
* fix: added logic to compute pass/fail for sql queries with cte, nested queries, and joins

* added logic to correctly compute pass / fail rows

* style: ran python linting

* fix: failing tests

* style: fix linting error

* fix: flawed count logic

* fix: handle case where we don't compute row count
2025-09-23 16:53:51 +02:00
Suman Maharana
79fde4ab02
Minor: improved dbt debug logs (#23509) 2025-09-23 19:52:58 +05:30
Ayush Shah
d94b39f6f5
fix(ssl): Update SSLManager to use dynamic schema registry paths (#23505) 2025-09-23 18:10:18 +05:30
Keshav Mohta
f1afe8f5f1
Fixes #23356: Databricks OAuth & Azure AD Auth (#23482)
* feat: databricks oauth and azure ad auth setup

* refactor: add auth type changes in databricks.md

* fix: test after oauth changes

* refactor: unity catalog connection to databricks connection code
2025-09-23 15:22:50 +05:30
Suman Maharana
1c710ef5e3
Fix Stream logger url (#23491) 2025-09-23 14:35:14 +05:30
Keshav Mohta
9262040381
fix: handle database native types for create table request during openlineage lineage (#23513) 2025-09-23 10:11:39 +02:00
Suman Maharana
e2b903532e
Fixes - Kafkaconnect lineage & descriptions (#23234)
* Fix Kafkaconnect lineage & descriptions

* fix typos

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comments

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* address comms

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-23 10:08:37 +02:00
Pere Miquel Brull
49bdf1a112
MINOR - Report status for tests that blow up (#23326)
* MINOR - Report status for tests that blow up

* format
2025-09-22 16:34:36 +02:00
Mohit Tilala
d1e60acd2a
[SAP HANA] Prevent exponential processing lineage parsing and use full name for filtering (#23484)
* Prevent exponential processing lineage parsing

* Use full name of views for filtering

* pylint fix - isort
2025-09-22 19:46:34 +05:30
Keshav Mohta
1a67e4fb7d
Feature: MariaDB Stored Procedures and Functions Support #23422 2025-09-18 17:59:39 +05:30
Akash Verma
da5dab7fef
Fixes #23388: Handle string and dict types for Metabase dataset_query field (#23417)
* Handle string and dict types for Metabase dataset_query field

* Added tests

---------

Co-authored-by: Akash Verma <akashverma@Mac.lan>
2025-09-16 16:57:08 -07:00
Mohit Tilala
61ed53f7b2
Handled none procedure_name in stored proc lineage processing (#23408) 2025-09-16 12:39:15 +05:30
Sriharsha Chintalapani
cf7931ee3b
Add logging endpoint into S3 (#22533)
* Add logging endpoint into S3

* Update generated TypeScript types

* Stream Ingestion logs to S3

* Update generated TypeScript types

* Address comments

* Update generated TypeScript types

* create logs mixin, use clients to stream logs

* centralize logs sending into mixin

* use StreamableLogHandlerManager instead global handler

* improve condition

* remove example workflow file

* formatting changes

* fix tests and format

* tests, checkstyle fix

* minor changes

* reformat code

* tests fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2025-09-15 07:22:25 -07:00
Keshav Mohta
11a719e611
Fixes: Oracle Stored Packages Test Connection Step #23370 2025-09-12 17:38:34 +00:00
Keshav Mohta
1f379a8697
fix: added depth in json and pass in metadata entry (#23332) 2025-09-12 12:31:20 +05:30
Mohit Tilala
f9e866cd50
Fix incomplete trino view definition extraction (#23349) 2025-09-12 11:43:50 +05:30
Mayur Singal
38c707b0bc
MINOR: Fix column comment getting overriden in glue (#23329) 2025-09-11 17:29:23 +05:30
Mayur Singal
d705fffc1d
Fix #1968: Query Runner Schema (#23077) 2025-09-11 10:41:11 +05:30
Teddy
f3cb001d2b
ISSUE #2033-C - Support For DBX Exporter + Minor Fix to Status (#23313)
* feat: added config support for databricks

* fix: allow incrementing record count directly without storing element

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-10 12:04:46 +02:00
Suman Maharana
39cb165164
Feat: show dbt project name (#23044)
* Feat: show dbt project name

* Update generated TypeScript types

* added dbtSourceProject in data asset header properties

* Added tests

* Addressed comments

* Update generated TypeScript types

* move from dataAssetHeader to the dbt tab itself

* added unit test for added code

* test name change

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashish Gupta <ashish@getcollate.io>
2025-09-10 11:23:28 +02:00
IceS2
8177e529bc
FIXES #23220: Add cardinality metric for string and enum (#23052)
* Implement Cardinality Metric for String and Enum

* Add Unit Tests

* Update generated TypeScript types

* Update ingestion/src/metadata/profiler/metrics/hybrid/cardinality_distribution.py

Co-authored-by: Teddy <teddy.crepineau@gmail.com>

* Fix CTE to simplify it to work with sqlite

* Fix CTE to simplify it to work with sqlite

* Update generated TypeScript types

* Update generated TypeScript types

* Add 'cardinalityDistribution' metric to profiler configuration

* Update generated TypeScript types

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2025-09-09 16:38:53 +02:00
NadezhdaNovotortseva
3c9b3cac48
Lineage dialect postgres added to greenplum (#23291)
Co-authored-by: Надежда Коцюба <nadezhda.kotsyuba@uni.rest>
2025-09-08 15:26:23 +02:00
Teddy
1ef191a2aa
ISSUE #1534 - Profiler Refactor for Metadata Extraction Application (#23200)
* feat: added exporter app config

* refactor: added entityprofile resource & added backward compatibility to existing API

* feat: added tests to get_profile_data_by_type

* feat: remove non supported event types

* chore: added migrations to 1.9.7

* chore: added application creation readme

* chore: move migrations to 1.9.8

* fix: failing java test

* style: ran java linting
2025-09-05 13:07:04 +02:00
Keshav Mohta
103857f90c
Fixes #23010 #: BigQuery Project Selection In Profiler & AutoClassification Workflow (#23233)
* fix: added code for separate engine and session for each project in rofiler and classification and refactor billing project approach

* fix: added entity.database check, bigquery sampling tests

* fix: system metrics logic when bigquery billing project is provided
2025-09-05 14:09:14 +05:30
Mohit Tilala
d926ed9dad
[Snowflake] Handle cases when stream source is not retrievable (#23245) 2025-09-05 00:27:31 +05:30
Mohit Tilala
9b2b4d2452
[Lineage] Fix cross services lineage changes of service_names to missed methods (#23240)
* Fix cross db changes of service_names to missed methods

* Handle string value passed to service_names
2025-09-04 20:38:05 +05:30