38 Commits

Author SHA1 Message Date
Mayur Singal
7760663b22
MINOR: Change ingestion licence header (#20549) 2025-04-03 10:39:47 +05:30
Pere Miquel Brull
c68a45e7d8
Create new Auto Classification Workflow (#18610) 2024-11-19 08:10:45 +01:00
Teddy
45d27a377d
GEN 1184 - Added Workflow Classification and Metric LevelConfig (#18572) 2024-11-11 15:59:42 +01:00
Pere Miquel Brull
7012e73d75
GEN-1166 - Improve Ingestion Workflow Error Summary (#18280)
* GEN-1166 - Improve Ingestion Workflow Error Summary

* fix test

* docs

* comments
2024-10-16 18:15:50 +02:00
Teddy
449a5f2de3
FIX #11951 - ingestion logic for global profiler config (#15948)
* feat: add global metric configuration for the profiler

* style: ran java linting

* fix: renamed disable to disabled

* style: ran java linting

* feat: ometa sdk for profiler setting

* test: ingestion profiler global config tests

* fix: update metric name to use MetricType Enum

* fix: allow bot to retrieve settings

* fix: exclude GX artifacts

* feat: implement global profiler setting logic for ingestion side

* fix: exclude metrics if Metric is empty

* style: ran python linting

* style: ran python linting

* fix: skip empty metrics

* style: ran python linting

* fix: moved GET profiler config to seperate endpoint in system resource

* fix: moved compute metric filter to MetricFilter + renamed container

* fix: test failures

* fix: profiler test case
2024-04-22 22:35:37 +02:00
Ayush Shah
831fce5b7e
Fixes 10709: Add useFqnForFiltering to profiler workflow (#14717) 2024-01-18 18:52:43 +05:30
Pere Miquel Brull
a3bfd4e696
Part of #11968 - Restructure Profiler Workflow and PII Processor (#13059)
* Structure PII

* Restructure Profiler Workflow

* Update signature for abc

* remove profiler sink

* Fix tests

* Fix lint

* Fix test

* Fix test
2023-09-04 11:02:57 +02:00
Ayush Shah
ab1ec50c2c
Fixes Mssql Ntext, text and Image (#12490) 2023-07-20 13:34:35 +05:30
Teddy
b89cf64f14
Clean up profiler (#12369)
* ref: implemented interface for profiler components + removed struct logic

* ref: ran python linting

* ref: added UML diagram to readme.md

* ref: empty commit for labeler check

* ref: remove multiple context manager for 3.7 3.8 compatibility

* ref: remove
2023-07-12 17:02:32 +02:00
Ayush Shah
cb6e42941a
Fix 12025: Clickhouse NaN issue (#12079) 2023-06-22 12:51:56 +05:30
Teddy
ddbc7fe14d
Fixes #11570 - Add support for BQ Multi-project Profiler (#11692)
* fix: extracted profiler object from workflow and implemented factory to allow service base logic

* fix: ran python linting

* fix: renamed `base` to `base_profiler_source`

* fix: add logic to set correct database for BQ multi project ID connections

* fix: ran python linting
2023-05-20 14:22:53 -07:00
Teddy
9b4e9132ae
fixed #9656 - Add support for date type to column values to be between (#10890)
* fix: renamed  to  submodule

* fix: linting

* fix: columnValuesToBeBetween test for date column type
2023-04-04 17:16:44 +02:00
NiharDoshi99
1ff76f5e65
pii tagging using spacy (#10256)
* WIP: pii tagging using spacy

* added test cases and changes as per comment

* fix python checkstyle

* fix python checkstyle

* added score, test_cases and docs update

* solved merge conflict

* fix python checkstyle

* remove pii tagging using regex

* fix python test

* lib changes and added some test case

* changed as per comment

* fix: python test

* fix: changes to get source_config

* fix: changes as per comment
2023-03-03 18:33:18 +05:30
Teddy
754074f1be
Fixes #7758 - Added Column value and Integer Range Partitionning (#10350)
* feat(profiler): renamed  module to

* feat(profiler): added dbt-artifacts-parser to test setup.py

* feat(profiler): refactor workflow and interface

* feat(profiler): linting

* feat(profiler): removed old profiler modules

* feat(profiler): added support for value and integer range partition

* feat(profiler): fixed linting

* feat(profiler): added partitionning support for datalake profiler

* feat(profiler): removed `ProfilerInterfaceArgs` class

* feat(profiler): address comments

* feat(profiler): Added `OTHER` as an `IntervalType` for UI type generation
2023-03-01 08:20:38 +01:00
Ayush Shah
5be0f8ee76
Dl Profiler (#8694)
* DQ commit

* Add DL Profiler

* Fix Ingestion and Profliing pylint checks

* Fix Tests

* PyFormat files

* Fix Tests

* Resolve Comments

* Fix Tests and Format Files

* Resolve Comments

* Fix Pylint and Code smells

* Resolve Comments

* Fix S3 parquet

* Fix Metrics Code Smell
2022-11-15 16:01:10 +01:00
Teddy
f883863b8a
Fixes #7490 - Split Profiler and TestSuite Interface (#8032)
* Clean up test suite workflow and interface

* Fixed tests

* Split profiler and testSuite interfaces

* Cleaned up workflows and runners

* Fixed code formatting

* - remove old code
- remove `table` attribute used for testing and used mock instead

* Fixed execution bugs from refactor

* Fixed static type checking for profiler/api/workflow.py

* Fixed linting

* Added __init__ files
2022-10-11 15:57:25 +02:00
Onkar Ravgan
35efd49256
Added control for DBT descriptions (#7653)
* Added control for DBT descriptions

* Fixed tests

* Added UI changes

* fixed maven ci tests

* Java formatting changes

* ui review fixes

* Fixed pytests

* Fixed python integration tests

* fixed airflow tests

Co-authored-by: Onkar Ravgan <onkarravgan@Onkars-MacBook-Pro.local>
2022-09-26 16:19:47 +05:30
Teddy
ce578e73d4
Fixes #5831 by implenting testSuite workflow logic (#6911)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs

* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample

* moved status to workflow

* Fixed tests

* removed test logic from profiler sink

* Added logic to return sample from workflow sample value

* Added profiler examples

* Updated documentation for profiler

* Fixed code smells

* commited changed to profiler

* initial commit of the revamp workflow

* Fixed python formating

* cleaned up profiler submodule by removing test related files and functions

* Added airflow DAG logic for testSuite workflow

* Fixed code smells + added airflow ingestion tests + fixed comments
2022-08-25 10:01:28 +02:00
Ayush Shah
383f4497cc
Update Entity Reference parameter fields (#6841) 2022-08-22 19:37:24 +05:30
Teddy
78b5f8c8e2
Part 1 of #5831 -- Profiler workflow implementation (#6809)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs

* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample

* moved status to workflow

* Fixed tests

* removed test logic from profiler sink

* Added logic to return sample from workflow sample value

* Added profiler examples

* Updated documentation for profiler

* Fixed code smells
2022-08-19 10:52:08 +02:00
Teddy
abaf8a84e9
Fixes #5661 by removing association between profiler and data quality (#6715)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs
2022-08-17 12:53:16 +02:00
Ayush Shah
a6db2e8a84
Fix for profiler: modified filter patterns and added error handling (#6608) 2022-08-08 10:43:17 +05:30
Sriharsha Chintalapani
1a42428e42
Add time series extention (#6416)
Co-authored-by: Vivek Ratnavel Subramanian <vivekratnavel90@gmail.com>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2022-08-04 07:22:47 -07:00
Teddy
d097199d2f
Added validation in profiler workflow to ensure service name exists and raise more explicit error (#6036)
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2022-07-13 14:43:48 +02:00
Teddy
28336b0a43
Fix #3575 Data Quality - Partitioned Tables (#5441)
Co-authored-by: Teddy Crepineau <teddycrepineau@Teddys-MacBook-Pro.local>
Co-authored-by: Vivek Ratnavel Subramanian <vivekratnavel90@gmail.com>
2022-06-14 12:37:44 -07:00
Pere Miquel Brull
2444d3de3d
Fix #4235 - Run data profiler workflows from Airflow REST (#4325)
Fix #4235 - Run data profiler workflows from Airflow REST (#4325)
2022-04-21 17:53:29 +02:00
Sriharsha Chintalapani
be836e5404
Fix #4071: PUT IngestionPipeline missing property & error message (#4085)
Fix #4071: PUT IngestionPipeline missing property & error message (#4085)
2022-04-13 08:40:21 +02:00
Pere Miquel Brull
bd4071bd64
Fix #3826 & #3886 - Profiler workflow & filter pattern (#3893)
Fix #3826 & #3886 - Profiler workflow & filter pattern (#3893)
2022-04-06 17:05:00 +02:00
Pere Miquel Brull
63533eb388
Fix for connectors based on refactoring of schemas V2 (#3870)
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2022-04-05 18:33:25 -07:00
Pere Miquel Brull
b3480693e4
Fix #3824 - OMeta Schema and JSON Connections (#3861)
Fix #3824 - OMeta Schema and JSON Connections (#3861)
2022-04-05 21:20:39 +02:00
Pere Miquel Brull
e2539c5e83
Fix #3844 - First iteration for deprecating MetadataServerConfig (#3853)
* Style

* deprecate MetadataServerConfig

* Remove audience from Okta
2022-04-05 18:02:45 +05:30
Pere Miquel Brull
b3087d08b9
Fix #3522 - Add timeout to profiler (#3707)
Fix #3522 - Add timeout to profiler (#3707)
2022-03-30 08:54:27 +02:00
Pere Miquel Brull
16e82d45de
Fix #3371 - Run Profiler and Tests on a % of the data (#3424)
Fix #3371 - Run Profiler and Tests on a % of the data (#3424)
2022-03-16 06:05:59 +01:00
Pere Miquel Brull
130bbb0c5c
Fix #3104 - Remove unused imports with pycln (#3370)
Fix #3104 - Remove unused imports with pycln (#3370)
2022-03-14 06:59:15 +01:00
Pere Miquel Brull
71207de362
Fix #2875 - Profiler API Sink (#3011)
Fix #2875 - Profiler API Sink
2022-03-02 16:46:28 +01:00
Pere Miquel Brull
990608522a
Fix #2981 - Update Profile to match TableProfile (#2982) 2022-02-25 09:26:30 -08:00
Pere Miquel Brull
1224d20a36
Fix #2894 - Profiler Processor & Metrics (#2900)
Fix #2894 - Profiler Processor & Metrics (#2900)
2022-02-22 08:09:02 +01:00
Pere Miquel Brull
e55579aaa8
Fix #2845 - Init Profiler Workflow (#2862)
* Fix list typing

* Add sqlite service

* Add sqlite service

* Add sqlite service

* Refactor validation into class

* Refactor validation into class

* Prepare table simple profiler

* Add note

* test ORM conversion

* Prepare workflow config utilities

* Prepare workflow skeleton

* Use new core Validation

* Refactor workflow config parsing

* Add comment

* Simplify workflow validations

* Fix table metric check

* Add init for convenience, otherwise interpreter cries when trying to __call__ the get result

* Fix table metric check

* Format

* Format

* Fix table list and metrics init

* Prepare profiler workflow integration tests

* Bump version

* Fix pycharm imports

* format
2022-02-20 17:55:12 +01:00