317 Commits

Author SHA1 Message Date
Pere Miquel Brull
d8e2187980
#15243 - Pydantic V2 & Airflow 2.9 (#16480)
* pydantic v2

* pydanticv2

* fix parser

* fix annotated

* fix model dumping

* mysql ingestion

* clean root models

* clean root models

* bump airflow

* bump airflow

* bump airflow

* optionals

* optionals

* optionals

* jdk

* airflow migrate

* fab provider

* fab provider

* fab provider

* some more fixes

* fixing tests and imports

* model_dump and model_validate

* model_dump and model_validate

* model_dump and model_validate

* union

* pylint

* pylint

* integration tests

* fix CostAnalysisReportData

* integration tests

* tests

* missing defaults

* missing defaults
2024-06-05 21:18:37 +02:00
gpby
d909a3141e
Teradata Connector (#16373)
* [WIP] add teradata connector

* [WIP] add teradata ingestion

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* [WIP] add teradata connector

* Reformat code

* Remove unused databaseName property
2024-05-28 06:40:22 +02:00
Pere Miquel Brull
17aed8a9e9
MINOR - Fix GX version (#16394) 2024-05-22 19:25:42 +00:00
Imri Paran
d5bf30ccd3
MINOR: trino integration test (#16291)
* added trino integration test

* - removed warnings for classes which are not real tests
- removed "helpers" as its being used

* use a docker network instead of host

* print logs for hive failure

* removed superset unit tests

* try pinning requests for test

* try pinning requests for test

* wait for hive to be ready

* fix trino fixture

* - reduced testcontainers_config.max_tries to 5
- remove intermediate containers

* print with logs

* disable capture logging

* updated db host

* removed debug stuff

* removed debug stuff

* removed version pin for requests

* reverted superset

* ignore trino integration on python 3.8
2024-05-22 15:12:00 +00:00
harshsoni2024
a1a68ae73b
restrict requests version on setup (#16365) 2024-05-21 18:13:37 +05:30
Mayur Singal
1798b647c3
MINOR: Bump Collate Sqllineage Version (#16293) 2024-05-17 08:39:37 +02:00
Pere Miquel Brull
263afbeb5c
MINOR - pkg_resources is deprecated (#16316) 2024-05-17 07:56:07 +02:00
Pere Miquel Brull
53185fd30b
MINOR - Add Integration Test for S3 Storage (#16277)
* MINOR - Add Integration Test for S3 Storage

* MINOR - Add Integration Test for S3 Storage

* MINOR - Add Integration Test for S3 Storage

* format

* format
2024-05-16 10:03:27 +02:00
Pere Miquel Brull
f1f15cfc07
MINOR - Remove setuptools req (#16276)
* MINOR - Remove setuptools req

* relax system req

* fix
2024-05-16 10:03:15 +02:00
Suman Maharana
8dc623e280
Added KafkaConnect Connector (#16217) 2024-05-10 14:29:45 +05:30
Prajwal214
e191034c18
Minor: Updated Python Dependency for GreenPlum (#16139) 2024-05-09 08:57:25 +05:30
Onkar Ravgan
ceaa9d3e8a
Fix #15611 Parse PowerBI Dax files for lineage (#15975) 2024-04-29 14:55:06 +05:30
Ayush Shah
a15da7ec98
Issue #14812: Add support for empty string as missing count (#16017) 2024-04-25 09:45:26 +05:30
Imri Paran
93ec391f5c
MINOR: Dynamodb sample data (#15264)
* feat(nosql-profiler): row count

1. Implemented the NoSQLProfilerInterface as an entrypoint for the nosql profiler.
2. Added the NoSQLMetric as an abstract class.
3. Implemented the interface for the MongoDB database source.
4. Implemented an e2e test using testcontainers.

* added profiler support for mongodb connection

* doc

* use int_admin_ometa in test setup

* - fixed linting issue in gx
- removed unused inheritance

* moved the nosql function into the metric class

* feat(profiler): add dynamodb row count

* feat(profiler): add dynamodb row count

* formatting

* validate_compose: raise exception for bad status code.

* fixed import

* format

* feat(nosql-profiler): added sample data

1. Implemented the NoSQL sampler.
2. Some naming changes to the NoSQL adaptor to avoid fixing names with the profiler interface.
3. Tests.

* added default sample limit

* formatting

* fixed import

* feat(profiler): dynamodb sample data

* tests for dynamo db sample data

* format

* format

* use service connection for nosql adaptor factory

* fixed tests

* format

* fixed after merge
2024-04-22 17:46:40 +02:00
IceS2
08c114c340
FIXES 15626: Fix issue with not url model store (#15974)
* Changed the MLModelStore storage type to string

* fix checkstyle

* remove unused files

* Update requirements

* fix checkstyle

* Skipping MLFlow intergration on python 3.8

* Hack to allow pytest to parse the mlflow integrations test on python 3.8

* Fix checkstyle
2024-04-22 15:50:44 +02:00
Imri Paran
0a1018648c
Fixes #15566: add dynamodb row count (#15204)
* feat(nosql-profiler): row count

1. Implemented the NoSQLProfilerInterface as an entrypoint for the nosql profiler.
2. Added the NoSQLMetric as an abstract class.
3. Implemented the interface for the MongoDB database source.
4. Implemented an e2e test using testcontainers.

* added profiler support for mongodb connection

* doc

* use int_admin_ometa in test setup

* - fixed linting issue in gx
- removed unused inheritance

* moved the nosql function into the metric class

* feat(profiler): add dynamodb row count

* feat(profiler): add dynamodb row count

* formatting

* fixed import

* format

* dded dynamodb row count

* format

* removed unused factory file

* removed "validate"

* migrations

* removed validations

* format

* linting

* fixed: test_amundsen.py

* Update schemaChanges.sql
2024-04-22 09:14:52 +02:00
IceS2
fd51df25fa
Removed stale examples. Update dependencies (#15951) 2024-04-18 17:41:09 +02:00
Imri Paran
29cd58b628
MINOR: added integration test for SQL SERVER (#15919)
* adventure works mssql test case

* adventure works mssql test case

* fixed tests

* fixed tests

* fixed tests

* fixed tests
2024-04-17 12:19:37 +02:00
Mayur Singal
9439f58bef
MINOR: Add Databricks ssl dependencies (#15895) 2024-04-15 15:37:07 +05:30
Pere Miquel Brull
a1404e6b4a
MINOR - Clean ingestion dependencies (#15679)
* WIP - MINOR - Clean ingestion dependencies

* test

* test

* Clean imports

* add pyiceberg for test

* Revert "add pyiceberg for test"

This reverts commit ab26942736586f089a57a644ffd727aca200db62.

* add pyiceberg for test

* Remove docker dep

* clean local docker sh

* MINOR - AKS Airflow troubleshooting docs

* Fix action

* clean local docker sh
2024-04-11 14:30:40 +02:00
Pere Miquel Brull
9d7bfa363e
MINOR - Clean metadata CLI (#15631)
* Docs

* MINOR - Clean metadata CLI

* remove tests
2024-03-26 16:36:47 +01:00
mgorsk1
98850ab5cc
feat: OpenLineage integration (#15317)
* 🎉 Init OpenLineage connector

Co-authored-by: dechoma <dominik.choma@gmail.com>

* MLH - make linter happy

* review fixes

* 🐛 Fix path for ol event in tests

* 🐛 Fix path for ol event in tests

* Update ingestion/setup.py

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>

* Update ingestion/src/metadata/ingestion/source/pipeline/openlineage/metadata.py

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>

* Update ingestion/src/metadata/ingestion/source/pipeline/openlineage/models.py

Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>

* review fixes 2

* linter

* review

* review

* make linter happy

* fix test_yield_pipeline_lineage_details test

* make linter happy

* fix tests

* fix tests 2

---------

Co-authored-by: dechoma <dominik.choma@gmail.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2024-03-12 08:39:25 +01:00
IceS2
86a2930cfa
Minor: Fix E2E Ingestion Tests (#15462)
* Fix E2E Tests

* Fix E2E Tests

* Update mysql count, schema changes

* Addition to vertica e2e

* Temporary Github Action modification to test

* Fix Redshift round issue post 10 digits

* modify e2e gh file

* fix gh error

* fix matrix syntax

* Fix Redash counts

* Update py-cli-e2e-tests.yml

* Fix Redshift referenced before assignment error

* Revert Py tests e2e

* Modify Elasticsearch configuration

* Modify Elasticsearch configuration

* Update docker-compose.yml

* Test only running the python tests as e2e

* Comment side effects

* Test

* Test

* Fix name

* Add missing shell property

* Add bigquery to e2e

* Uncomment needed step

* test

* test

* test

* test

* Add control ci pipeline

* Add new e2e tests

* test

* fix

* fix

* fix

* Uncomment needed steps

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2024-03-05 16:00:22 +01:00
Teddy
16fdc249b7
fix: pin pandas version to 2.1.x (#15333) 2024-02-24 23:12:22 +05:30
Imri Paran
18c22c4178
Fixes #10013: Implement first stage of NoSQL profiler (#15189)
* feat(nosql-profiler): row count

1. Implemented the NoSQLProfilerInterface as an entrypoint for the nosql profiler.
2. Added the NoSQLMetric as an abstract class.
3. Implemented the interface for the MongoDB database source.
4. Implemented an e2e test using testcontainers.

* added profiler support for mongodb connection

* doc

* use int_admin_ometa in test setup

* - fixed linting issue in gx
- removed unused inheritance

* moved the nosql function into the metric class

* formatting

* validate_compose: raise exception for bad status code.

* fixed import

* format
2024-02-22 11:46:19 +01:00
Pere Miquel Brull
62c0cc7563
#13985 - Azure KV Secrets Manager (#15192)
* #13985 - Azure KV Secrets Manager

* Format

* #13985 - Azure KV Secrets Manager

* #13985 - Azure KV Secrets Manager

* Simplify credentials loading

* Simplify credentials loading

* Simplify credentials loading
2024-02-20 07:18:35 +01:00
Imri Paran
aeb5fbe303
fixes #12591: add BigTable (#15122)
* feat(connector): add BigTable

* bigtable work

1. docstrings
2. tests
3. created a Row BaseModel
4. implemented a ClassConverter

* docs moved to separate PR

* format files

* minor cosmetic

- removed TODO
- changed headers' year to 2024 for new files
- fixed typos

* format

* formatting and comments

1. added missing docstrings.
2. abstracted the _find_instance method.
3. aliased the IDs used in the BigTable connection

* added comment regarding private key

* added comments regarding column families

* enclose get_schema_name_list in `try/except/else`

* format

* streamlined get_schema_name_list to include all logic in the try block
2024-02-13 08:28:01 +01:00
Mayur Singal
d76809801d
MINOR: Fix Databricks SDK Breaking Change (#15037) 2024-02-06 10:42:53 +05:30
Mayur Singal
a9fc51ec8b
MINOR: Change sqllineage import to collate_sqllineage (#14870) 2024-02-05 19:44:08 +05:30
IceS2
373cafcda2
Fixes #5448: Implement initial Iceberg Connector using PyIceberg (#14825)
* Create the iceberg connection schema

* Link the IcebergConnection configuration with the forms on the UI

* Add the pyiceberg dependency on the ingestion package

* Create the get_connection and test_connection functions

* First iteration on the iceberg ingestion logic

* Add A more comprehensive implementation of the Iceberg Source

* Add UnitTests

* Update icebergConnection definition

* Update the iceberg souce code based on new schema

* Updated icebergConnecgtion schema for simplicity and to be able to configure Converters

* Updated setup dependencies to be more flexible

* Updated get_owner_ref logic

* Fix formatting

* Changed the icebergConnection json schema structure to enable the ClassConverters

* Add the IcebergCatalog and IcebergFileSystem ClassConverters

* Refactor the code to take into account the new jsonSchema structure

* Fix formatting

* Add Documentation for the Iceberg Connector

* Fix Menu order for Iceberg

* ui: add Iceberg service icon and constant

* Fix DynamoDb Catalog issue due to how PyIceberg instantes it

* Changed uri title to URI

* Fix ClassConverter for Iceberg

* Fix GetSecretValue for password types

* Fix formatting

* Fix formatting

* Add Iceberg Connector Images for the docs

* Add pylint disable for Hacky super() call

* Add Iceberg.md for the UI docs

* Fix pylint complaint

* Fix pylint complaint

* Fix UnitTests

* fix type error and unit tests

* update pipeline type checks

* Fix Sonar Cloud complaints

---------

Co-authored-by: Sachin Chaurasiya <sachinchaurasiyachotey87@gmail.com>
2024-01-29 06:32:58 +01:00
Teddy
c90a86b8ad
chore: remove typing-extension dependency (#14757) 2024-01-17 09:58:10 +00:00
Vijay Ravi
abe716d7fa
MINOR: Update jsonpatch package version (#14740)
* MINOR: Update jsonpatch package version

* Format

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2024-01-17 07:17:11 +01:00
Shiyang Xiao
241f3c68d7
Fixes #14413: Add SAS connector (#14415)
* feat: SAS Viya connector

* refactor SASCatalog to SAS

* add SAS logo to UI and connection documentation

* doc changes

* modify ingestion logic

* revert original changes

* added support for dataflow & perfect logic for reports/datatables

* add filter doc

* more updates to perfect ingestion for each asset type

* fix a bug with table lineage not created properly

* Delete ingestion/pipelines/sasCatalog.yaml

* precomit fix

* Conversion to database connector

* minor fixes

* make custom properties type generic

* Add SAS javaEnum

* add dummy variable for sas.yaml

---------

Co-authored-by: lizmc <liz.mcintosh@sas.com>
Co-authored-by: Shiyang Xiao <Shiyang.Xiao@sas.com>
2024-01-11 06:46:57 -08:00
Ayush Shah
9c6d202555
Add Sample data, modify regex pattern (#14467) 2024-01-11 14:23:33 +05:30
Mayur Singal
190212c8ac
Fix #11556: Add support for Db2 for IBM i (#14680) 2024-01-11 12:35:52 +05:30
Onkar Ravgan
ecdb7b9f41
Fixes 14109 and 14325: Optimised Tableau Connector (#14548)
* Optimised tableau conn

* Added comment
2024-01-08 06:33:05 +01:00
Pere Miquel Brull
0e92a975e3
#14425 - Create ingestion-base-slim image (#14426)
* #14425 - Create ingestion-base-slim image

* Format

* Bump airflow

* Bump constraints
2023-12-19 11:09:38 +01:00
Pere Miquel Brull
eaacc693bd
#12027 - Add support for Python 3.11 (#14385)
* Fix datamodel codegen and bump versions

* Add 3.11 tests

* Update hive

* pandas

* pandas
2023-12-14 15:46:58 +01:00
Pere Miquel Brull
7fcdf08ca4
#11626 & #14131 - Lineage with other Entities & attr-based xlets (#14191)
* Add OMEntity model

* Test OMEntity

* Update repr

* Fix __str__

* Add entity ref map

* Test serializer for backend

* Fix tests

* Fix serializer

* Test runner

* Add runner tests

* Update docs

* Format
2023-12-01 06:29:44 +01:00
chyueyi
b6b337e09a
feat: add support for doris datasource (#14087)
* feat: add support for doris datasource

* fix: fix python style check

* fix: add pydoris dependency

* fix: add pydoris dependency

* fix: py_format_check

* fix: parse error when doris view column is VARCHAR(*), check data length if not digit then return 1

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2023-11-28 13:57:52 +05:30
VolkovGeoPhy
c95de19a09
great-expectations~=0.18.0 (#14056) 2023-11-23 12:33:07 +01:00
Pere Miquel Brull
c7e758eccc
Fix pyproject - TypeError: 'list' object is not a mapping (#14064)
* Fix pyproject - TypeError: 'list' object is not a mapping

* Add dynamic optional-dependencies

* Add dynamic optional-dependencies

* Bump datamodel

* Bump datamodel
2023-11-22 08:38:47 +01:00
Pere Miquel Brull
caaf0e7a1d
Fix #12436 - Migrate to pyproject.toml (#14025)
* test

* Use pyproject.toml

* Fix pylint

* makefile

* makefile

* Fix pylint

* isort

* pyproject

* Airflow apis pyproject

* Remove ingestion core

* isort

* Fix makefile help
2023-11-22 07:10:37 +01:00
Mohit Yadav
3f8a931e39
Bump Pom Version to 1.3.0-SNAPSHOT (#14008)
* Bump Pom Version to 1.3.0-SNAPSHOT

* chore: Fix Makefile recipe

* fix: Prepare Main Branch for Next Feature Release

* fix: Syntax issue

---------

Co-authored-by: Akash-Jain <Akash.J@deuexsolutions.com>
2023-11-17 11:33:47 +05:30
Ayush Shah
f94e2dbb47
Fix Hive Bytes issue, add athena yaml, fix bigquerymultiple project id token issue (#13640) 2023-10-18 23:48:21 +05:30
Onkar Ravgan
115cd3506d
Enable pymssql python library (#13489)
* enabled dep

* review comments
2023-10-10 12:51:52 +02:00
Pere Miquel Brull
f6a87ee02a
Fix #12082 - Bump PyAthena version (#13464) 2023-10-09 20:47:19 +02:00
Mayur Singal
f879656f0a
Fix #12047: Clean commonregex package from setup (#13439) 2023-10-05 13:41:31 +05:30
Teddy
c4a3de6a85
fix: handle tableConfig for profiler CLI (#13437)
* fix: handle tableConfig for profiler CLI

* fix: empty commit for CI
2023-10-05 10:02:57 +02:00
Nguyen Huu Loc
ef1974edd6
Support LookML multi repos (#13140)
* Draft: Support LookML multi repos

* [Looker] manually create Dashboard datamodel

* [Looker] Support remote import & lineage for looker view

* Rollback parser.py

* refactor code

* Update code

* Remove logs & add comments

* Remove Middle & Nothing

* - Fix yield datamodel error
- Remove logs

* Support clone repo from Bitbucket

* Fix typo

* Optimize imports

* Fix pylint

---------

Co-authored-by: Loc Nguyen <loc.nguyenhuu@xendit.co>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-10-04 15:16:21 +02:00