36 Commits

Author SHA1 Message Date
Teddy
9dbcb3911b
Fix minor column data quality test bugs (#7111)
* Fixed test name issue + filtered out partition details for non BQ tables

* Exclude non BQ table from partition processing

* Fixed test + formating
2022-09-01 13:47:00 +02:00
Teddy
a39c4db8e7
Add partial support for BQ partitioned table (#7066)
* Added support for BQ time based partition (not ingestion)

* Fixed minor errors in test suite workflow
2022-08-30 11:39:15 -07:00
Nahuel
7863f040d1
Fix#6027: Improve logging in ORM profiler (#6919)
* Improve logging in ORM profiler

* Fix failing test

* Updated logging in test_suite module to match company's format

Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
2022-08-25 11:19:31 +02:00
Teddy
ce578e73d4
Fixes #5831 by implenting testSuite workflow logic (#6911)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs

* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample

* moved status to workflow

* Fixed tests

* removed test logic from profiler sink

* Added logic to return sample from workflow sample value

* Added profiler examples

* Updated documentation for profiler

* Fixed code smells

* commited changed to profiler

* initial commit of the revamp workflow

* Fixed python formating

* cleaned up profiler submodule by removing test related files and functions

* Added airflow DAG logic for testSuite workflow

* Fixed code smells + added airflow ingestion tests + fixed comments
2022-08-25 10:01:28 +02:00
Sriharsha Chintalapani
821d70eae4
Fix #6782: Separate TableProfile and ColumnProfile api calls (#6783)
* Fix #6571: Add EntityLink for the testCase to ID columns

* Fix #6571: Add EntityLink for the testCase to ID columns

* Fix #6782: Separate TableProfile and ColumnProfile api calls

* Fix #6782: Separate TableProfile and ColumnProfile api calls - fix tests

* Fix #6782: Separate TableProfile and ColumnProfile api calls - fix tests

* Fix setFields

* Fix tests

* Update pipeline status endpoint

* updated ui side as per new schema for profiler tab

* updated profiler details with new API

* Fix Profiler tests and validation errors (#6827)

* add profilerSample field in TableProfile

* add profilerSample field in TableProfile

* get columnProfile with field profile

* get columnProfile with field profile

* Fixed sample data and python tests

* fixed date range filter change issue

* handled empty profiler case

* Added column level test case and results

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
Co-authored-by: Ayush Shah <ayush@getcollate.io>
Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
2022-08-22 21:31:24 +05:30
Teddy
78b5f8c8e2
Part 1 of #5831 -- Profiler workflow implementation (#6809)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs

* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample

* moved status to workflow

* Fixed tests

* removed test logic from profiler sink

* Added logic to return sample from workflow sample value

* Added profiler examples

* Updated documentation for profiler

* Fixed code smells
2022-08-19 10:52:08 +02:00
Teddy
abaf8a84e9
Fixes #5661 by removing association between profiler and data quality (#6715)
* Added database filter in workflow

* Removed association between profiler and data quality

* fixed tests with removed association

* Fixed sonar code smells and bugs
2022-08-17 12:53:16 +02:00
Ayush Shah
a6db2e8a84
Fix for profiler: modified filter patterns and added error handling (#6608) 2022-08-08 10:43:17 +05:30
Sriharsha Chintalapani
1a42428e42
Add time series extention (#6416)
Co-authored-by: Vivek Ratnavel Subramanian <vivekratnavel90@gmail.com>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2022-08-04 07:22:47 -07:00
Nahuel
a878aa911c
Fix#6212: Retrieve connection params from secret manager in CLI commands (#6441)
* Retrieve connection params from secret manager for database connectors

* Retrieve connection params from secret manager for all services except database connectors

* Stop retrieving connection from SM in Airflow rest plugin

* Retrieve connection params from secret manager for dashboard services

* Retrieve connection params when initializing Workflow/ProfilerWorkflow objects

* Align services topologies + comment changes in topology runner

* Address SonarCloud bug detected

* Update database service topology

* Address PR comments

* Address PR comments

* Address PR comments
2022-08-02 09:13:46 +02:00
Teddy
6397b6a0b1
Fixes #6325 -- Implement multithreading for metrics computation (#6406)
* Added tests for multithreading SQA interface

* Added multithread support for metric computation

* Added thread ID to log debuger

* Cleaned up tests

* Fixed python formatting issues

* Added non blocking result processing + threadCount in config file to set numbers of threads

* Added frontend input field to set number of threads

* Fixed code smell, bug and comments from reviewer
2022-07-29 10:41:53 +02:00
Teddy
aae4410c93
Fies #6183 - Ability to set profile sample at the profilier workflow level (#6292)
Fies  #6183 - Ability to set profile sample at the profilier workflow level (#6292)
2022-07-25 12:08:20 +02:00
Teddy
e1fac99353
Fixes #5723 and implement interface processor logic (#6219)
* Added datetime for min/max

* Added profiler interface

* Update core.py to work with profiler_interface

* Implement interface logic for orm_profiler object

* Fix unique_ratio logic

* removed changes to table.json

* Added Protocol for type hint

* Changed protocol to abc + fixed sonar code smell

* Fixed py_format
2022-07-20 17:54:10 +02:00
Teddy
d097199d2f
Added validation in profiler workflow to ensure service name exists and raise more explicit error (#6036)
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2022-07-13 14:43:48 +02:00
Teddy
861a5ecf59
Fixes #5800 profiler workflow for engine with catalog + modulo for Presto (#5801) 2022-06-30 14:58:30 +02:00
Teddy
7431c1c226
[Issue-5188] Implement User Query for Sampler and Profiler (#5578)
* Added custom query sample for sample data ingestion

* Added logic to run table profiling against user's query

* Added tests for user query logic in profiler and sampler

* Added user profiling to tableProfile + fixed format

* staging commit

* Added logic to add profileQuery to table entity

* Added limit to sample rows
2022-06-24 14:46:34 +02:00
Teddy
28336b0a43
Fix #3575 Data Quality - Partitioned Tables (#5441)
Co-authored-by: Teddy Crepineau <teddycrepineau@Teddys-MacBook-Pro.local>
Co-authored-by: Vivek Ratnavel Subramanian <vivekratnavel90@gmail.com>
2022-06-14 12:37:44 -07:00
Pere Miquel Brull
8e9d0a73f6
Fix #3573 - Sample Data refactor & ORM converter improvements (#5265)
Fix #3573 - Sample Data refactor & ORM converter improvements (#5265)
2022-06-08 16:10:40 +02:00
Pere Miquel Brull
04ede3b05b
Fix #3395 - Profiler linting (#5177)
Fix #3395 - Profiler linting (#5177)
2022-05-27 09:50:08 +02:00
Pere Miquel Brull
0f5777a9fa
Fix #4483 Airflow REST stage file loc; Fix #4738 disable rest_on_return (#5138)
Fix #4483 Airflow REST stage file loc; Fix #4738 disable rest_on_return (#5138)
2022-05-26 09:21:36 +02:00
Pere Miquel Brull
35e67890b8
Fix #5141 - Iterate over all Entities in the profiler workflow (#5146)
Fix #5141 - Iterate over all Entities in the profiler workflow (#5146)
2022-05-26 07:35:23 +02:00
Mayur Singal
41ee3a5aaf
Fix #3940 : Refractor Sql Source (#5046)
Fix #3940 : Refractor Sql Source (#5046)
2022-05-25 15:41:38 +02:00
Pere Miquel Brull
0c51ecde63
Fix #2830 - Centralise loggers and update format (#4570)
Fix #2830 - Centralise loggers and update format (#4570)
2022-04-29 06:54:30 +02:00
Mayur Singal
db0e34c709
Fixing Test Connection for Dynamo & Glue (#4316)
* Fixing Test Connection for Dynamo

* Fixed Glue Connector

* renamed engine to connection

* Fixed the return signature

* Added dataclass
2022-04-22 11:30:59 +05:30
Pere Miquel Brull
2444d3de3d
Fix #4235 - Run data profiler workflows from Airflow REST (#4325)
Fix #4235 - Run data profiler workflows from Airflow REST (#4325)
2022-04-21 17:53:29 +02:00
Sriharsha Chintalapani
be836e5404
Fix #4071: PUT IngestionPipeline missing property & error message (#4085)
Fix #4071: PUT IngestionPipeline missing property & error message (#4085)
2022-04-13 08:40:21 +02:00
Pere Miquel Brull
06a3e4c989
Fix #3825 - Schema Name, SQL Source FQDN & ORM Profiler (#3942)
* Fix db schema name

* Fix sqlite connection

* Correctly register scanned tables

* improve sqlite connection

* Adapt schemas on ORM profiler

* Format
2022-04-08 19:28:10 +05:30
Pere Miquel Brull
bd4071bd64
Fix #3826 & #3886 - Profiler workflow & filter pattern (#3893)
Fix #3826 & #3886 - Profiler workflow & filter pattern (#3893)
2022-04-06 17:05:00 +02:00
Pere Miquel Brull
e2539c5e83
Fix #3844 - First iteration for deprecating MetadataServerConfig (#3853)
* Style

* deprecate MetadataServerConfig

* Remove audience from Okta
2022-04-05 18:02:45 +05:30
Sriharsha Chintalapani
7b3e459eb3
Fix #3659 Refactor Service Connection String to be specific to per service (#3804)
* Fix #3659 Refactor Service Connection String to be specific to per service

* Simplify and centralize Airflow Pipeline info for REST (#3740)

* Remove code

* Modified Configs based on refactoring schema (#3816)

* Clean WorkflowContext


Co-authored-by: pmbrull <peremiquelbrull@gmail.com>
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2022-04-04 12:46:09 -07:00
Pere Miquel Brull
9ced748c4f
Use root for FQDN (#3780) 2022-03-31 12:05:11 +05:30
Pere Miquel Brull
4a752e3ab2
Fix #3151 - Ingestion profiler should use ORM Profiler (#3192) 2022-03-06 15:43:43 -08:00
Pere Miquel Brull
71207de362
Fix #2875 - Profiler API Sink (#3011)
Fix #2875 - Profiler API Sink
2022-03-02 16:46:28 +01:00
Pere Miquel Brull
a4b383fa83
Fix #2897 - Profiler CLI (#2967)
Fix #2897 - Profiler CLI
2022-02-24 08:03:50 +01:00
Pere Miquel Brull
1224d20a36
Fix #2894 - Profiler Processor & Metrics (#2900)
Fix #2894 - Profiler Processor & Metrics (#2900)
2022-02-22 08:09:02 +01:00
Pere Miquel Brull
e55579aaa8
Fix #2845 - Init Profiler Workflow (#2862)
* Fix list typing

* Add sqlite service

* Add sqlite service

* Add sqlite service

* Refactor validation into class

* Refactor validation into class

* Prepare table simple profiler

* Add note

* test ORM conversion

* Prepare workflow config utilities

* Prepare workflow skeleton

* Use new core Validation

* Refactor workflow config parsing

* Add comment

* Simplify workflow validations

* Fix table metric check

* Add init for convenience, otherwise interpreter cries when trying to __call__ the get result

* Fix table metric check

* Format

* Format

* Fix table list and metrics init

* Prepare profiler workflow integration tests

* Bump version

* Fix pycharm imports

* format
2022-02-20 17:55:12 +01:00