3401 Commits

Author SHA1 Message Date
Ferjani Nasraoui
b0e1a136cf
Fixes #21106: Support owner extraction from serialized Airflow DAGs (#22071)
* fix(airflow): correctly extract owners from serialized Airflow DAGs

Airflow serialization format wraps tasks under `__var` and `__type`.
Previously, the OpenMetadata Airflow connector failed to extract task owners properly in this format.

This patch:
- Flattens `__var` when parsing task owners
- Fallbacks to `default_args["owner"]` if no task-level owner is explicitly present
- Ensures correct DAG owner is picked as the most common task owner
- Handles compatibility with older Airflow versions

Fixes: #21106

* test(airflow): add tests for owner extraction from serialized Airflow DAGs

Adds new test cases to validate owner extraction logic:
- Owners from serialized task format (`__var`)
- Fallback to `default_args['owner']` if task owners are missing
- Resolution of most common owner
- Compatibility with unstructured or missing owners

* remove test version specific comment

* simplify comments and warnings

* fix return statement

* fixing formatting

* adding handling of default args

* fixing and adding more tests
2025-07-03 14:21:36 +05:30
Ayush Shah
4c1976409a
Update README and Ingestion Framework Documentation (#22080) 2025-07-02 16:21:06 +05:30
Teddy
29450d1104
feat: add support for DBX system metrics (#22044)
* feat: add support for DBX system metrics

* feat: add support for DBX system metrics

* fix: added WRITE back

* fix: failing test cases

* fix: failing test
2025-07-02 08:54:16 +02:00
Suman Maharana
e36e5da26e
Added Databricks pipeline Lineage (#22014) 2025-06-30 10:41:22 +05:30
Suman Maharana
b4cd7b7046
Add: Postgres SP and UDF descriptions (#22021) 2025-06-30 10:39:09 +05:30
harshsoni2024
10b377590c
qlikcloud get script tables (#22022) 2025-06-30 10:36:57 +05:30
Mayur Singal
c8f94783ed
Minor: Python E2E Fixes (#21959) 2025-06-28 18:05:58 +05:30
sonika-shah
5d733b490c
Minor Fix : query_cost_record_search_index Search exception for elasticsearch instance (#21985)
* Fix : query_cost_record_search_index Search exception for elasticsearch instance

* add sample query to cover test scenarios

* update mapping and fix test
2025-06-28 11:22:34 +05:30
Pere Miquel Brull
5f0f32c366
FIX #21955 - Handle sampler SQA sessions (#21994)
* FIX #21955

* FIX #21955
2025-06-27 08:58:25 +02:00
harshsoni2024
9bb0527192
display object column type (#22002) 2025-06-27 12:07:06 +05:30
IceS2
c899d45e8e
MINOR: Update Trino Connection to fix data diff (#21983) 2025-06-27 07:58:48 +02:00
harshsoni2024
616579a6c1
feat-21984: REST service process nested objects inside array dtype in schema (#21984) 2025-06-27 10:44:35 +05:30
IceS2
94cf3e0fd6
MINOR: Extend profile workflow config to allow engine configuration (#21840)
* Update Profile Workflow to allow engine configuration

* Add ui generated schemas

* Add Repository Override mechanism based on annotations

* Implement logic to use the ProcessingEngine configuration

* Update SparkEngine to use remote and not master
2025-06-26 19:11:26 +05:30
Mayur Singal
803abb9373
Minor: Fix Tableau Lineage in Multi Schema Model (#21965) 2025-06-25 23:43:06 +05:30
Mehul Shroff
35215762cb
Update metadata_service_helper.py (#21948) 2025-06-25 17:07:40 +05:30
IceS2
392f081255
Update PySpark and Delta-Spark Versions to use PySpark 3.5.6 (#21919) 2025-06-25 11:45:01 +02:00
Suman Maharana
2aa2282e03
Added project to datamodel (#21926) 2025-06-25 02:26:22 +05:30
Ayush Shah
11ac56356b
MINOR: Modify Sample data (#21599) 2025-06-24 17:16:13 +05:30
Mayur Singal
43863ae6f3
MINOR: Fix pytests jaraco (#21894) 2025-06-23 13:55:43 +05:30
harshsoni2024
f490406968
MINOR: pbi improve logging (#21868) 2025-06-20 16:32:56 +05:30
Keshav Mohta
73ea60b898
Refactor: Unity Catalog (#21801) 2025-06-20 16:04:34 +05:30
IceS2
5bac5f2509
MINOR: Fix Airflow API Test Connection (#21818)
* Fix Airflow API Test Connection

* Fix query_parser_source test_connection

* Already update all test_connection I could find

* Fix circular dependency

* Fix invalid variable

* Fix wrong import

---------

Co-authored-by: Aniket Katkar <aniketkatkar97@gmail.com>
2025-06-19 17:58:07 +05:30
Himanshu Khairajani
79c3d55128
Fix #21679: Added metadata ingest-dbt CLI Command for Direct DBT Artifacts Ingestion (#21680)
* metadata dbt

* fix:
 - default path to current directory
 - addional warning and exception handling for missing metadata config vars

* test: add unit tests for DBT Ingestion CLI

* refactor

* PR review:
 - using Pydantic to parse and validate the openmetadata config in dbt's .yml
 - extended test-cases
 - giving user more configuration options for ingestion

* py refactoring

* add: dbt-auto ingest docs

* Improvements:
 - using environement variables for loading sensitve variables
 - added docs for auto dbt-ingestion for dbt-core
 - more test cases

* fix:
 - test case for reading JWT token inside the the method

* refactor: py code formatting

* refactor: py formatting

* ingest-dbt docs updated

* refined test cases

* Chore:
 - sonar vulnerability issue review
 - using existing URL class for host validation

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-06-19 17:57:10 +05:30
Suman Maharana
7be62f3ed9
Add: Tableau Hierarchy project filter (#21811) 2025-06-19 11:18:52 +05:30
IceS2
040a33117c
MINOR: Fix Profiler Infinite Loop (#21843) 2025-06-19 10:33:45 +05:30
Sriharsha Chintalapani
802438f0ea
Fix default boost score, improve fqn parsing (#21854)
* Fix explain turned by default, use dfs_query_then_fetch in cases of sharding of search cluster

* Add exact match configs

* Add exact match configs

* Update Logic to build search source builder with exact match priority

* Revert "Update Logic to build search source builder with exact match priority"

This reverts commit 175a2e9c6b67ee90d4b2a35af89bb035e8c45131.

* Revert "Add exact match configs"

This reverts commit 3fd52606610bbb97a676170004cab6d7adc31a0d.

* revert display name change

* make boost mode as sum by defaul

* add more fqnparts for schema and database

* revert DFS_QUERY_THEN_FETCH since sharding wasn the issue

* use fqn split

* refactor fqn parsing

---------

Co-authored-by: mohitdeuex <mohit.y@deuexsolutions.com>
2025-06-18 18:56:11 -07:00
Sriharsha Chintalapani
8adda4955c
Revert "Issues in Search Relevancy (#21841)" (#21853)
This reverts commit f388e570c1dac5b9eee31364870fb66e42715f18.
2025-06-18 16:43:34 -07:00
Mohit Yadav
f388e570c1
Issues in Search Relevancy (#21841)
* Fix explain turned by default, use dfs_query_then_fetch in cases of sharding of search cluster

* Add exact match configs

* Add exact match configs

* Update Logic to build search source builder with exact match priority

* Revert "Update Logic to build search source builder with exact match priority"

This reverts commit 175a2e9c6b67ee90d4b2a35af89bb035e8c45131.

* Revert "Add exact match configs"

This reverts commit 3fd52606610bbb97a676170004cab6d7adc31a0d.

* revert display name change

* make boost mode as sum by defaul

* add more fqnparts for schema and database

* revert DFS_QUERY_THEN_FETCH since sharding wasn the issue

* use fqn split

* Refactor FQN Parts

---------

Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-18 16:33:46 -07:00
harshsoni2024
d38ee0ed52
feat-21712: PowerBI internal entities & cross workspace lineage (#21837) 2025-06-18 20:46:17 +05:30
Keshav Mohta
7c0eeef049
Fixes #19692: Implemented Nifi Pipeline Lineage (#21802)
* feat: implemented nifi pipeline lineage

* test: implemented tests for nifi pipeline lineage

* fix: yield_pipeline_bulk_lineage_details output type hinting

* fix: component check in connections

---------

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2025-06-18 07:31:04 +00:00
harshsoni2024
a09a696358
MINOR: Tableau proxy url for sourceurl (#21799) 2025-06-18 10:52:08 +05:30
Teddy
e4dffd281c
fix: preserve BQ struct field casing (#21716) 2025-06-17 23:58:35 +02:00
IceS2
cf288aa5de
Remove useless comment (#21819) 2025-06-17 14:27:41 -07:00
Sriharsha Chintalapani
c90138501f
Fix #21822: OpenSearch by default limits the number of characters it will analyze for highlighting to 1,000,000 characters. If your description field is very large (e.g. Markdown docs, embedded HTML, or verbose documentation), this limit gets exceeded. (#21821)
* Add sample data

* Fix index mappings to optimize the highlighter for OpenSearch
2025-06-17 14:22:11 -07:00
Mayur Singal
34c43eaea0
MINOR: Fix pytests (#21807) 2025-06-17 23:44:29 +05:30
IceS2
e79c54e6a5
MINOR: Add injection to profiler (#21738)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Add Injection to Metrics in Profiler

* Add Injection to the Profiler

* Fix UnitTests

* Fix Pytests

* Fix Tests

* Fix types
2025-06-17 19:01:00 +02:00
harshsoni2024
0f79d8ea1d
MINOR: pytest opt out flaky test (#21800)
* remove mlflow test until fixed

* alationsink test count fixed

* pylint fix gx
2025-06-17 14:23:28 +05:30
IceS2
49df5fc9de
MINOR: Implement dependency injection on ingestion (#21719)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Implement Dependency Injection for the Ingestion Framework

* Fix Test

* Fix Profile Test Connection

* Fix test, making the injection test run last

* Update connections.py

* Changed NewType to an AbstractClass to avoid linting issues

* remove comment

* Fix bug in service spec

* Update PyTest version to avoid importlib.reader wrong import
2025-06-16 08:03:38 +02:00
Sriharsha Chintalapani
074329418f
Fix #17244: Pagination for columns in UI (#21508) 2025-06-15 21:30:31 +05:30
Mayur Singal
64626dd4fd
MINOR: Implement Lineage Filter for UC (#21761)
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2025-06-13 22:14:56 +05:30
Pere Menal-Ferrer
44e09e41a2
Revert "FIX #1464 (#21520)" (#21726)
This reverts commit 1e86f9870fd663122b9bbb64f3cf17cf32619c7f.
2025-06-13 17:27:32 +02:00
IceS2
891ff4184d
MINOR: Initial implementation for our Connection Class (#21581)
* Initial implementation for our Connection Class

* Implement the Initial Connection class

* Add Unit Tests

* Fix Test

* Fix Profile Test Connection

* Remove unit test

* Remove comment

* Fix tests and missing changes
2025-06-13 14:52:29 +02:00
IceS2
f44d81ddf2
Update test_deltalake.py (#21735)
* Update test_deltalake.py

* Ignore it while collecting the tests
2025-06-13 08:27:38 +02:00
Mayur Singal
93b5cec8f9
Fix #21099: fix Superset ingestion bad query (#21650) 2025-06-13 08:32:15 +05:30
Mayur Singal
f4e9d69930
Fix #21109: Unable to connect to Opensearch using AWS Credentials (#21441) 2025-06-13 08:30:44 +05:30
Keshav Mohta
cd24c0a69a
Feature: Microstrategy Lineage (#21678) 2025-06-13 08:28:29 +05:30
Mohit Tilala
2803e62f0b
Add missing Data space type in qlikcloud (#21698)
* Add missing `Data` space type in qlikcloud

* Fix broken json files
2025-06-12 14:49:10 -07:00
Mayur Singal
d20d278c4b
Minor: Improve UC owner ingestion (#21741)
* Minor: Improve UC owner ingestion

* lint
2025-06-12 14:48:29 -07:00
harshsoni2024
6a6180b2e3
powerbi change owner condition (#21724) 2025-06-12 16:11:43 +05:30
Suman Maharana
18f9f2cdb6
Fix: Tableau project id should always be a string (#21700) 2025-06-12 11:21:53 +05:30