354 Commits

Author SHA1 Message Date
Keshav Mohta
cde3a7dd1e
Feature: Cassandra Connector (#18943) 2024-12-12 15:12:55 +05:30
Mayur Singal
6d21dd12a4
MINOR: Snowflake UDF Lineage Support - main (#18886) 2024-12-05 00:19:40 +05:30
Imri Paran
c5171139c3
chore: added data diff to base requirements (#18789) 2024-11-26 17:28:22 +00:00
Suman Maharana
b220bdb891
Fix: mstr removed dependency issues (#18732)
* Fix: mstr removed dependency issues

* fix session still active error

* py_format

* fix tests

* Addressed Comments

* Addressed Comments

* addressed comments

* Addressed comments

* Add constants

* Fix pytests
2024-11-22 21:19:21 +05:30
Imri Paran
089fa785a8
build(setup-py): update pydantic version (#18541)
Update pydantic version to ">=2.7.0" in order to include `IncEx` that was introduced in 3d1355f168
2024-11-13 10:14:06 +01:00
Imri Paran
cdaa5c10af
[GEN-1996] feat(data-quality): use sampling config in data diff (#18532)
* feat(data-quality): use sampling config in data diff

- get the table profiling config
- use hashing to sample deterministically the same ids from each table
- use dirty-equals to assert results of stochastic processes

* - reverted missing md5
- added missing database service type

* - use a custom substr sql function

* fixed nounce

* added failure for mssql with sampling because it requires a larger change in the data-diff library

* fixed unit tests

* updated range for sampling
2024-11-11 10:07:23 +01:00
Mayur Singal
66cf003cc3
MINOR: Fix pytest 3.11 taking 2hr (#18533) 2024-11-06 19:28:48 +05:30
Mayur Singal
f813ab730e
MINOR: Airflow dependency Fix (#18530) 2024-11-06 15:51:43 +05:30
Teddy
d579008c99
GEN 1683 - Add Column Value to be At Expected Location Test (#18524)
* feat: added column value to be in expected location test

* fix: renamed value -> values

* doc: added 1.6 documentatio entry

* style: ran python linting

* fix: move data packaging to pyproject.yaml

* fix: add init file back for data package

* fix: failing test case
2024-11-06 11:17:13 +01:00
Nicola Coretti
7ebc62dca7
feat: Add support for exasol datasource (#17166)
* Add flake.nix

* Add lockfile for flake

* Update nix environment and document usage

* Add schema for exasol connector

* Add Exasol definitions to databaseService

* Fix error in exasol connector schema

* Add additional connection options/settings to exasol connector

* Add exasol-connector to ui

* Add depdencies for exasol-connector

* Update notes

* Update ingestion code

* Add Basic Documentation for Exasol Connector

* Update flake file

* Add developer notes

* Add python script which can be used as entry point for debugging in ide

* Add config file which can be used for debugging (manual execution)

* Update debug script

* Update developer notes

* Remove old developer notes

* Add .venv to gitignore

* Update dev notes

* Update development notes

* Update ExasolSource

* Establish basic connection to Exasol DB from connector

* Update exasol connector connection settings

* Add service_spec for exasol plugin

* Remove development files

* Remove unused module

* Applied code formatter

* Update exasol dependency constraint(s)

* Add unit test for exasol connection url(s)

* Fixed test expectations for exasol connection url test(s)

* Adjust the test query for the Exasol connection test
2024-10-31 08:11:30 +01:00
Imri Paran
3c7f995677
MINOR: fix(looker): exclude version
https://github.com/looker-open-source/sdk-codegen/issues/1518
2024-10-11 18:47:43 +00:00
IceS2
02d9494e7f
MINOR: Update DB2 dependencies to fix issue about not having ibm-db installed (#18192)
* Update DB2 dependencies to fix issue about not having ibm-db installed

* Fix checkstyle
2024-10-09 18:39:29 +02:00
Prajwal214
bbd159b947
Minor: Updated pyiceberg version to 0.5.1 (#18155)
Co-authored-by: Prajwal Pandit <prajwalpandit@Prajwals-MacBook-Air.local>
2024-10-08 14:55:57 +05:30
sam-mccarty-mavenclinic
0dd3e97170
Fix 17911: Looker parsing improvements for liquid templating and view/model aliasing (#17912)
* Looker parsing improvements for liquid templating and view/model aliasing

* add python-liquid dependency to looker plugin requirements

* move to static method with 'openmetadata' context and add rendering tests

* remove backtick stripping

---------

Co-authored-by: Imri Paran <imri.paran@gmail.com>
2024-09-27 13:55:15 +02:00
Mayur Singal
e373fddbde
MINOR: MSTR connector import fix (#18025) 2024-09-27 16:57:09 +05:30
Imri Paran
25284e0232
MINOR: fix snowflake system metrics (#17989)
* fix snowflake system metrics

* format

* add link to logs and commit
fixed the dq cli test

* reverted bad formatting

* fixed models.py

* removed version pinning for data diff in tests
2024-09-26 11:55:17 +00:00
Imri Paran
64f8571a77
pin testcontainers by minor version (#17938) 2024-09-24 09:18:36 +02:00
Teddy
b222de66fe
Minor fix pylint version (#17944)
* fix import issue

* fix: fix pylint version

* style:ran python linting

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
2024-09-20 18:50:28 +02:00
Imri Paran
d09bca26f6
MINOR: fix mssql integration test (#17923)
* change tag for sql server due to https://github.com/microsoft/mssql-docker/issues/441 (or some similar issue)

* use 2022-latest

* fixed mssql tests

* format

* used new columns

* use the custom sql server
2024-09-20 08:52:40 +02:00
Imri Paran
760b8eb742
fix(ingestion): deltalake (#17910)
add pin for deltalake<0.20 to avoid ad35eda798 (diff-f81b8b91e9721dc0a3235b6674a57c0700153b92b59f3988d357950a9c8c4760R1142-R1144)
2024-09-19 08:25:19 +02:00
Imri Paran
3282b057d8
ci: pin spacy version (#17814)
spacy 3.8 requires numpy 2.0 which is not compatible with openmetadata-ingestion requirements:
184e508d9c
2024-09-12 15:12:53 +05:30
Imri Paran
a3d6c1dd20
MINOR: tests(datalake): use minio (#17805)
* tests(datalake): use minio

1. use minio instead of moto for mimicking s3 behavior.
2. removed moto dependency as it is not compatible with aiobotocore (https://github.com/getmoto/moto/issues/7070#issuecomment-1828484982)

* - moved test_datalake_profiler_e2e.py to datalake/test_profiler
- use minio instead of moto

* fixed tests

* fixed tests

* removed default name for minio container
2024-09-12 07:13:01 +02:00
Pere Miquel Brull
c309906a1b
MINOR - Bump Presidio Analyzer and validate support for legal entities (#17750) 2024-09-06 16:07:08 +02:00
Imri Paran
5133c31d31
MINOR: kafka integration tests (#17457)
* tests: kafka integration

kafka integration tests with schema registry

* added ignore kafka for python 3.8

* fixed tests
2024-08-21 16:05:09 +05:30
Imri Paran
a59eb2a3cd
fix: pin numpy version (#17487) 2024-08-20 10:19:05 +00:00
Mayur Singal
8b9d8aad91
MINOR: Fix postgres conftest import (#17210) 2024-07-27 18:08:42 +05:30
IceS2
14e475cefe
MINOR: Add PyRight TypeCheck to our Python Project (#17060)
* Add PyRight TypeCheck to our Python Project

* Change pyright for basedpyright

* Fix PyRight
2024-07-18 11:52:56 +02:00
Imri Paran
0fee79b200
MINOR: fix sample data issue with Pydantic v2 and refactor python integration tests (#16943)
* tests: refactor

refactor tests and consolidate common functionality in integrations.conftest

this enables writing tests more concisely.
demonstrated with postgres and mssql.
will migrate more

* format

* removed helpers

* changed scope of fictures

* changed scope of fixtures

* added profiler test for mssql

* fixed import in data_quality test

* json safe serialization

* format

* set MARS_Connection

* use SerializableTableData instead of TableData

* deleted file test_postgres.py

* fixed tests

* added more test cases

* format

* changed name test_models.py

* removed the logic for serializing table data

* wip

* changed mapping in common type map

* changed mapping in common type map

* reverted TableData imports

* reverted TableData imports

* reverted TableData imports
2024-07-17 08:11:34 +02:00
Teddy
3bcfdfe014
MINOR - enable dynamic assertion dl (#17008)
* fix: refactor runtime param setter + add dynamic assertion support for datalake

* chore: add missing test dependencies

* fix: centralize objecxt constructor in interface

* fix: remove abstract decorator in interface
2024-07-16 11:01:43 +02:00
Matt Chamberlin
d757aa9d77
Fixes 16652: add GCS storage service (#16917)
* FEAT-16652: add GCS storage service

* reformat

* update connection tests

* fix tests

* relax google-cloud-storage version constraint

* fix GCP config in tests

---------

Co-authored-by: Matthew Chamberlin <mchamberlin@ginkgobioworks.com>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2024-07-10 14:03:28 +02:00
Ayush Shah
421d191bae
Fixes 16562: Modify HiveCompiler to compile column names properly (#16954)
* Modify HiveCompiler to compile column names properly
2024-07-09 12:59:23 +05:30
k.nakagaki
9a31a35296
Fixes 9875: supporting gcp secret manager (#16505)
* Split ExternalSecretsManagerTest to new ExternalSecretsManagerTest and AWSBasedSecretsManagerTest

* implement SecretsManagerFactory to create GCPSecretsManager

* implement GCPSecretsManager

* implements gcp secret manager.

* Fix it for the GCP's rule.

* create a template of GCP

* fix compile error

* implements to use project_id

* add library for the google cloud secret manager

* add test code for using google credential in the docker container

* modify docker-compose.yml for GCP

* add google_crc32c module

* modify ways to get project id

* create a new docker-compose.yml for Google Cloud

* create a new document

* create compose file for gcp secret manager

* fix invalid styles and formats

* downgrade google library to avoid conflicting protoc versions
2024-06-28 21:09:02 -07:00
Imri Paran
5e5c811ef2
moved int_admin_ometa to a dedicated module (#16768) 2024-06-25 11:21:22 +05:30
Imri Paran
b960b60965
Fix #16421: add tableDiff test case (#16554)
* feat: add tableDiff test case

This changed introduces a "table diff" test case which
compares two tables and fails if they are not identical.
The similarity is made based on a specific "key" (because the test only makes sense when performed on ordered collections).

1. Added the `tableDiff` test definition.
2. Implemented a "runtime" parameters feature which injects additional parameters for the test at runtime.
3. Integration tests (because of course).

This feature was not tested end-to-end yet because "array" data

* pydantic v2

* format

* format

* format and added data diff to setup.py

* format

* fixed param issue which has type ARRAY

* fixed runtime_parameter_setter

* moved models to parent directory

* handle errors in table diff

* fixed issue with edit test case

* format

* added more details to pytest skip

* format

* refactor: Improve createTestCaseParameters function in DataQualityUtils

* fixed unit test

* removed unused fixture

* removed validator.py

* fixed tests

* added validate kwarg to tests_mixin

* removed "postgres" data diff extra as they interfere with psycopg2-binary

* fixed tests

* pinned tenacity for tests

* reverted tenacity pinning

* added ui support for test diff

* fixed dq cypress and added edit flow

* organized the test case

* added dialect support

* fixed tests

* option style fix

* fixed calculation for passing/failing rows

* restrict the tableDiff test to limited services

* set where to None if blank string

* fixed where clause

* fixed tests for where clause

* use displayName in place of name in edit form

* added docs for RuntimeParameterSetter

* fixed cypress

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2024-06-20 16:54:12 +02:00
IceS2
f0049853ec
FIXES 14885: Initial deltalake implementation for s3 (#16665)
* Initial deltalake implementation for s3

* Fix styles

* Fix test_amundsen

* Fix UnitTests

* Fix Checkstyle

* Fix integration tests due to datalake client refactor

* Fix unit tests

* Fix tests

* Fix Integration DeltaLake Storage test

* Skip delta storage integration test for python 3.8

* DeltaLake JSONSchema changes migrations

* Update import name

* Add some comments based on sonarcloud suggestions

* Update DeltaLake documentation

* Resolve some comments
2024-06-20 12:08:21 +05:30
Mayur Singal
12362a9dea
MINOR: Fix numpy version (#16696) 2024-06-18 17:03:35 +05:30
Mayur Singal
e3fa340c8f
MINOR: Pydantic fixes for redshift & kafka (#16638) 2024-06-14 14:08:59 +05:30
Pere Miquel Brull
d8e2187980
#15243 - Pydantic V2 & Airflow 2.9 (#16480)
* pydantic v2

* pydanticv2

* fix parser

* fix annotated

* fix model dumping

* mysql ingestion

* clean root models

* clean root models

* bump airflow

* bump airflow

* bump airflow

* optionals

* optionals

* optionals

* jdk

* airflow migrate

* fab provider

* fab provider

* fab provider

* some more fixes

* fixing tests and imports

* model_dump and model_validate

* model_dump and model_validate

* model_dump and model_validate

* union

* pylint

* pylint

* integration tests

* fix CostAnalysisReportData

* integration tests

* tests

* missing defaults

* missing defaults
2024-06-05 21:18:37 +02:00
gpby
d909a3141e
Teradata Connector (#16373)
* [WIP] add teradata connector

* [WIP] add teradata ingestion

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* Reformat code

* Remove unused databaseName property
2024-05-28 06:40:22 +02:00
Pere Miquel Brull
17aed8a9e9
MINOR - Fix GX version (#16394) 2024-05-22 19:25:42 +00:00
Imri Paran
d5bf30ccd3
MINOR: trino integration test (#16291)
* added trino integration test

* - removed warnings for classes which are not real tests
- removed "helpers" as its being used

* use a docker network instead of host

* print logs for hive failure

* removed superset unit tests

* try pinning requests for test

* try pinning requests for test

* wait for hive to be ready

* fix trino fixture

* - reduced testcontainers_config.max_tries to 5
- remove intermediate containers

* print with logs

* disable capture logging

* updated db host

* removed debug stuff

* removed debug stuff

* removed version pin for requests

* reverted superset

* ignore trino integration on python 3.8
2024-05-22 15:12:00 +00:00
harshsoni2024
a1a68ae73b
restrict requests version on setup (#16365) 2024-05-21 18:13:37 +05:30
Mayur Singal
1798b647c3
MINOR: Bump Collate Sqllineage Version (#16293) 2024-05-17 08:39:37 +02:00
Pere Miquel Brull
263afbeb5c
MINOR - pkg_resources is deprecated (#16316) 2024-05-17 07:56:07 +02:00
Pere Miquel Brull
53185fd30b
MINOR - Add Integration Test for S3 Storage (#16277)
* MINOR - Add Integration Test for S3 Storage

* MINOR - Add Integration Test for S3 Storage

* MINOR - Add Integration Test for S3 Storage

* format

* format
2024-05-16 10:03:27 +02:00
Pere Miquel Brull
f1f15cfc07
MINOR - Remove setuptools req (#16276)
* MINOR - Remove setuptools req

* relax system req

* fix
2024-05-16 10:03:15 +02:00
Suman Maharana
8dc623e280
Added KafkaConnect Connector (#16217) 2024-05-10 14:29:45 +05:30
Prajwal214
e191034c18
Minor: Updated Python Dependency for GreenPlum (#16139) 2024-05-09 08:57:25 +05:30
Onkar Ravgan
ceaa9d3e8a
Fix #15611 Parse PowerBI Dax files for lineage (#15975) 2024-04-29 14:55:06 +05:30
Ayush Shah
a15da7ec98
Issue #14812: Add support for empty string as missing count (#16017) 2024-04-25 09:45:26 +05:30