317 Commits

Author SHA1 Message Date
NiharDoshi99
1ff76f5e65
pii tagging using spacy (#10256)
* WIP: pii tagging using spacy

* added test cases and changes as per comment

* fix python checkstyle

* fix python checkstyle

* added score, test_cases and docs update

* solved merge conflict

* fix python checkstyle

* remove pii tagging using regex

* fix python test

* lib changes and added some test case

* changed as per comment

* fix: python test

* fix: changes to get source_config

* fix: changes as per comment
2023-03-03 18:33:18 +05:30
Teddy
754074f1be
Fixes #7758 - Added Column value and Integer Range Partitionning (#10350)
* feat(profiler): renamed  module to

* feat(profiler): added dbt-artifacts-parser to test setup.py

* feat(profiler): refactor workflow and interface

* feat(profiler): linting

* feat(profiler): removed old profiler modules

* feat(profiler): added support for value and integer range partition

* feat(profiler): fixed linting

* feat(profiler): added partitionning support for datalake profiler

* feat(profiler): removed `ProfilerInterfaceArgs` class

* feat(profiler): address comments

* feat(profiler): Added `OTHER` as an `IntervalType` for UI type generation
2023-03-01 08:20:38 +01:00
Mayur Singal
cd4461397d
Add impyla as scheme for hive connector (#10270) 2023-02-22 16:54:56 +05:30
Teddy
83be5d933b
Fixes #9301 - Refactor TestSuite and Remove Pandas from Base Requirements (#10244)
* feat(testSuite): extracted out column test for SQA type

* refactor(testSuite): extracted SQA column and table tests into their own classes

* refactor(testSuite): Added pkutil namespace package style for test suite classes

* refactor(testSuite): added dynamic importer function for test cases

* refactor(testSuite): black formatting

* refactor(testSuite): fixed linting issues

* refactor(testSuite): refactor metrics for dataframe

* refactor(testSuite): Added Mixins and base methods

* refactor(testSuite): extrcated out get bound for floats

* refactor(testSuite): Added pandas column test cases

* refactor(testSuite): Deleted old column tests

* refactor(testSuite): Added table tests for datalake

* refactor(testSuite): Removed old tests definition

* refactor(testSuite): changed registry to dynamic class inport

* refactor(testSuite): renamed dl_fn to df_fn

* refactor(testSuite): updated registry unit test

* refactor(testSuite): updated import path to sqa like column

* refactor(testSuite): cleaned up imports in old files

* refactor(testSuite): harmonzied SQALikeColumn object to replicate SQA Column object

* refactor(testSuite): linting

* refactor(testSuite): linting

* refactor(testSuite): raise expection on DQ exception

* refactor(testSuite): linting

* refactor(testSuite): removed pandas from base requirements

* refactor(testSuite): Added __futur__ for py3.7 type hint

* refactor(testSuite): added `df` to good-names

* refactor(testSuite): renamed Handler to Validator

* refactor(testSuite): Added test inheritance for column tests

* refactor(testSuite): cleaned up column type check

* refactor(testSuite): cleaned up typo

* refactor(testSuite): extracted main table test logic into parent class

* refactor(testSuite): linting

* refactor(testSuite): linting fixes

* refactor(testSuite): address doc string and linting issues
2023-02-22 09:42:34 +01:00
VolkovGeoPhy
7a59bc7676
>= grpc-tools 1.47.2 (Done) (#10218) 2023-02-20 18:07:27 +05:30
Nahuel
b9a3c06104
Bump main branch to version 1.0.0 (#10040)
* Bump to version 0.13.2

* Bump mvn projects to 1.0.0-SNAPSHOT

* Bump python projects to 1.0.0.dev0

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-02-02 12:56:14 +01:00
Pere Miquel Brull
f0f3f0be6a
Add looker unit tests (#9691)
* Add looker tests

* Empty-Commit

* Install GE for tests

* Fix usage details python name

* Add missing test requirement
2023-02-01 09:20:26 +00:00
Ayush Shah
747fcf569b
Add docs - quicksight, lineage... (#10023) 2023-01-31 15:17:40 +00:00
Onkar Ravgan
949989fb1c
Added dbt parser (#9982)
* Added dbt parser

* Added library dependency

* format and final fixes

* Addressed review comments

* Fixed typo
2023-01-29 20:47:39 +01:00
Pere Miquel Brull
f6d59f599e
Pin SQLAlchemy lower than 2 (#9952) 2023-01-27 15:26:30 +01:00
Nahuel
254ee9a186
Fix#9460: Avoid reuse inspector to get view definition (#9821)
* Avoid reuse inspector to get view definition

* Update openmetadata-sqllineage version
2023-01-20 13:54:41 +00:00
Nahuel
ddff6e2875
Fix: Replace sqllineage with openmetadata-sqllineage (#9800)
* Replace sqllineage with openmetadata-sqllineage

* Fix checkstyle and failing test

* Move logic to retrieve dialect of a service type into a class

* Improve py-check message when it fails

* Updated mapper

* Update code after merge
2023-01-19 14:56:29 +01:00
Pere Miquel Brull
294277708b
Fix #9558 - Add a greater range for boto3 dependency (#9778)
* add boto3 wiggle room

* add boto3 wiggle room
2023-01-18 08:20:40 +01:00
Sriharsha Chintalapani
2a314809c1
Keep elasticsearch version to be 7.13.1 (#9756)
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-01-17 19:12:49 -08:00
NiharDoshi99
029dbe892e
Fix: added test case for atlas (#9678)
* Fix: added test case for atlas

* Fix: resolved conflict

* Fix: changing back neo4j to old version

* Fix: changing back neo4j to old version

* Fix: changes as per comment

* Fix: changes as per comment

* Fix: python checkstyle
2023-01-13 16:07:29 +05:30
NiharDoshi99
1ec324e43e
Fix: neo4j version bump (#9680) 2023-01-11 18:28:25 +05:30
Pere Miquel Brull
bf753a4dee
Fix #7768 - Update and organize versions (#9664)
Fix #7768 - Update and organize versions (#9664)
2023-01-11 07:05:12 +01:00
Pere Miquel Brull
84348d4748
Fix #8866 - bump datamodel-codegen (#9623)
* Fix #8866 - bump datamodel-codegen

* Update connection options and arguments structure

* Add builders test

* Format

* Allow Any values in componentConfig

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2023-01-09 13:20:32 +01:00
Ayush Shah
1d930ad14b
Fix security vulnerability (#9580) 2023-01-05 12:36:00 +05:30
Pere Miquel Brull
7f21a7bced
Fix #8088 - Restructure source connections & clients (#9545) 2023-01-02 13:52:27 +01:00
Chirag Madlani
bf6fc5f93a
prepare(release) next release (#9479)
* prepare(release) next release

* airflow typo

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2022-12-27 20:15:46 +05:30
Nahuel
2c43ebba6f
Fix#9448: Add ES volumes (#9506)
* Add ES volumes

* Fix run_local_docker script

* Fix error run_local_docker script

* Update Es volumes in docker-compose files
2022-12-23 17:33:30 +01:00
Ayush Shah
2bf5eb9051
fix 7995: profileSample % and row number (#9104) 2022-12-20 14:55:11 +05:30
Nahuel
a2b34dd0f4
Fix: Update Ingestion docker images and fix python libraries dependencies (#9342)
* Update Ingestion docker images and fix python libraries dependencies

* Install also apache-airlfow-providers-http
2022-12-16 14:46:25 +00:00
Nahuel
819001182f
Fix#9251: DB2 connection config and ingestion update (#9322)
* DB2 connection config and ingestion update

* Update ingestion/src/metadata/ingestion/source/database/common_db_source.py

Co-authored-by: Ayush Shah <ayush@getcollate.io>

* Update ingestion/src/metadata/ingestion/source/database/common_db_source.py

Co-authored-by: Ayush Shah <ayush@getcollate.io>

* Update bootstrap/sql/com.mysql.cj.jdbc.Driver/v007__create_db_connection_info.sql

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2022-12-16 07:43:18 +01:00
Ayush Shah
a633befe2a
Fix: Snyk Severity for Delta Spark (#9323) 2022-12-16 12:12:44 +05:30
Onkar Ravgan
b539b299ee
Integrated schema parsers (#9305)
* Integrated schema parsers

* Addressed review comments

* fixed pytests
2022-12-15 16:54:55 +05:30
Mayur Singal
099853ab10
Fix Clickhouse Comments (#9295) 2022-12-14 21:42:06 +05:30
Sriharsha Chintalapani
38074f763b
Fix #8509: Add schema fields (#9209)
* Fix #8509: Add schema fields

* Fix #4675: Ingestion deployment from UI is broken

* Added sample data for topics

* Fixed pytests

* Address comments

* Refactored sampledata according to new schema

* Added return type

* Feat(ui)  : Add support for Avro editor (#9224)

* Feat(ui)  : Add support for Avro editor

* chore : add mock data for nested fields

* style: add group and opacity css

* add schema fields component to topic details and unit test

* Add locale keys

* Add support for edit description.

* refactor: expandableConfig

* test: add unit test for util method

* chore : make changes according to the schema change

* test : fix unit test and add new mock data

* chore : rename files

* Add row key

* chore : add default value for tags

* chore : update util method

* chore : add support for editing field tags

* chore : rename util files

* test : add unit test

* add comments

* addressing comments

* Address comments

* Added avro requirments

* Added requirement versions

* fixed versions

* protobuf version fix

* chore : rename util test file

* Fixed Dataype

* test: add unit test for schema component

Co-authored-by: Onkar Ravgan <onkar.10r@gmail.com>
Co-authored-by: Sachin Chaurasiya <sachinchaurasiyachotey87@gmail.com>
2022-12-14 19:56:37 +05:30
NiharDoshi99
b959709275
Fix: added datalake gen2 (#9245)
* Fix: added datalake gen2

* Fix: pytest

* Fix: pytest

* Fix: made changes to incoporate azure in datalake

* Fix: adlfs version

* Fix: adlfs version

* Fix: add tsv file type

* Fix: refractor for reading tsv file

* Fix: refractor for reading tsv file
2022-12-14 17:05:59 +05:30
Milan Bariya
c6b98751af
Logic improved for BYTES column (#9261)
* Logic improved for BYTES column

* Try Except block added

* cahange based on comments
2022-12-13 20:10:45 +05:30
Pere Miquel Brull
c75ba751b7
Fix #9116 & #8284 - Clean tableau source, fix ownership, add description and SSL verification (#9241)
Fix #9116 & #8284 - Clean tableau source, fix ownership, add description and SSL verification (#9241)
2022-12-13 06:36:55 +01:00
Pere Miquel Brull
1b3ff505c2
Fix #8858 - Add chart description and add lineage flexibility (#9124)
Fix #8858 - Add chart description and add lineage flexibility (#9124)
2022-12-02 16:22:09 +01:00
Nahuel
76773e69de
Fix#6203: Refactor LineageRunner use (#9073)
* Refactor LineageRunner use

* Address PR comments

* Address pylint errors

* Fix failing test
2022-11-30 16:02:21 +01:00
Ayush Shah
91536044e7
Fix Grpcio failure (#8938) 2022-11-22 13:54:53 +00:00
Pere Miquel Brull
fe16dea584
Fix #8794 - Separate DL requirements and lazy imports (#8806) 2022-11-17 09:11:54 +00:00
Ayush Shah
44613b1532
Fix Profiler issue (#8796) 2022-11-16 17:13:34 +05:30
Teddy
3dbaa69978
Data insight workflow (#8729) 2022-11-15 05:44:25 +01:00
Mayur Singal
01bc9f1cfe
Fix PyMSSQL Version (#8696) 2022-11-14 08:40:41 +01:00
Pere Miquel Brull
34ba9d95c5
Ingestion Pipeline deployed, Athena tests and pydantic extras (#8682)
* Always run python tests

* Fix athena tests and types

* Update deployed prop in IngestionPipeline

* Fix #8554

* Format

* Use true as default deployed migration

* Remove repeated req

* Pydantic wiggle room
2022-11-13 11:59:43 +01:00
Onkar Ravgan
eee3f9ffec
Fix:#8553 Parse Avro/Protobuf/Json schemas (#8654)
* Added topic parsers

* Fixed pylint

* Addressed review comments

Co-authored-by: Onkar Ravgan <onkarravgan@Onkars-MacBook-Pro.local>
2022-11-11 16:35:09 +05:30
Teddy
199b342288
Fixes #8135 - Implement partitioning config for profiler (#8623)
* Added logic to handle partitioning config in profiler

* extracted get_partition_details out of workflow classes
2022-11-10 10:54:31 +01:00
Teddy
b44972ef60
Fixes #8470 - Implements refinement functions for web analytics events (#8528)
* Moved webanalytics type in its own folder

* Added data insight chart api endpoint

* Jave formatting

* Added resource descriptor

* Added metadata entity endpoint

* Added aggregation endpoint for dataInsight

* Fix tag name

* Added logic to ingestion pipeline resource to add ES config info if pipeline type is dataInsight

* added domo to test subpackage

* cleaned up branch by removing commit from issue-8353 that were not merged in main

* Added web analytics data refinement

* Added get_status function

* Added from __futur__ for typing

* Fixed typos brought up during reviews
2022-11-07 17:08:20 +01:00
Tushar Mittal
6f2c93089c
feat: add SageMaker connector (#8435)
* feat: add sagemaker connector

Signed-off-by: Tushar Mittal <chiragmittal.mittal@gmail.com>

* fix: fix linting errors and update imports

Signed-off-by: Tushar Mittal <chiragmittal.mittal@gmail.com>

* test: add unit tests for sagemake source

Signed-off-by: Tushar Mittal <chiragmittal.mittal@gmail.com>

Signed-off-by: Tushar Mittal <chiragmittal.mittal@gmail.com>
2022-11-03 18:19:20 +01:00
Tushar Mittal
2a65df5f36
feat: add Kinesis connector (#8452)
Signed-off-by: Tushar Mittal <chiragmittal.mittal@gmail.com>
2022-11-02 16:12:45 +05:30
Pere Miquel Brull
119763afc4
Bump datamodel-code-generator (#8492) 2022-11-02 10:31:44 +01:00
Ayush Shah
2d7d89754c
Remove Click and Add ArgParse (#8182) 2022-10-31 18:12:26 +05:30
amymareemc
12bc9df0b2
Issue 4886: Add support for Azure Blob (#8334)
* Issue 4686: Add support for Azure Blob

* ISSUE-4868: make changes as suggested in PR review

* ISSUE-4868: run py_format

* ISSUE-4868: Make changes to enum and formatting

* ISSUE-4868: fix linting issues and update setup.py
2022-10-26 16:14:51 +02:00
Nahuel
36b12bd6f1
Fix lineage issues with merge_into and copy grants queries (#8335) 2022-10-24 17:22:22 +02:00
Pere Miquel Brull
d576540cb6
Bump Snowflake version (#8300) 2022-10-21 09:41:43 +02:00