Ayush Shah
f94e2dbb47
Fix Hive Bytes issue, add athena yaml, fix bigquerymultiple project id token issue ( #13640 )
2023-10-18 23:48:21 +05:30
Ayush Shah
ac9e8c9e89
Add E2E - Oracle, Athena. Remove Duplicated code ( #13563 )
2023-10-18 16:57:06 +05:30
Pere Miquel Brull
899cd7e1fe
Fix DQ Workflow ( #13631 )
...
* Fix DQ Workflow
* Fix DQ Workflow
2023-10-18 11:49:38 +02:00
Onkar Ravgan
d70cf2ea7a
Fixed status class pydantic model ( #13627 )
2023-10-18 12:21:39 +05:30
Sriharsha Chintalapani
e1900d4ec1
Fix #13555 : Long column names considered repeated ( #13620 )
2023-10-17 10:29:22 -07:00
Onkar Ravgan
84a41a6fbf
fixed dm column names ( #13615 )
2023-10-17 09:01:00 -07:00
Onkar Ravgan
0307a59388
Added fixes ( #13589 )
2023-10-17 19:56:03 +05:30
Mayur Singal
6578383827
Fix incorrect ingestion pipeline duration ( #13587 )
2023-10-17 12:37:19 +05:30
Mayur Singal
67c74dc57d
Fix Nifi test connection ( #13528 )
2023-10-13 18:32:11 +05:30
Teddy
31d2595e4f
fix: pass rnd table bound columns to sample query ( #13561 )
2023-10-13 14:57:28 +05:30
07Himank
6ffe79f793
fixed ES Indexing for very large S3 Storage Service buckets fails ( #13507 )
2023-10-13 10:22:53 +05:30
Teddy
1cbdfb3ae7
Fixes #12601 - column filter for profiler workflow ( #13535 )
...
* fix: sample data ingestion to match entity profiler column setting
* fix: python linting
* fix: updated fn call
* fix: added logic to handle json filed in datalake connector
* fix: handle NA values in parsing
* fix: reverted sampler changes from #13338
* fix: reverted metric changes from #13338
* fix: added datalake profiler ingestion test
* fix: python linting
* fix: removed normalization of json blob in NoSQL db
2023-10-12 14:51:38 +02:00
Mayur Singal
f63881b8b6
Fix mysql E2E test count ( #13529 )
2023-10-12 11:25:14 +05:30
Onkar Ravgan
6e013246a7
dbt fixed null sql updates and source descriptions ( #13467 )
2023-10-12 11:07:58 +05:30
Teddy
e57849b732
Fixes #12298 - Update report data type to camel case ( #13505 )
...
* fix: updated DI to camelCase
* fix: ran linting
* fix: added migration
* fix: remove extra parenthesis in migration file
* fix: psql migration query
* fix: OS compose host
* fix: removed commented code block
2023-10-11 08:14:21 +02:00
Onkar Ravgan
115cd3506d
Enable pymssql python library ( #13489 )
...
* enabled dep
* review comments
2023-10-10 12:51:52 +02:00
Mayur Singal
f69cd9f54a
Fix hive e2e test count ( #13497 )
2023-10-10 00:21:23 -07:00
Teddy
eefce68015
fix: updated DI cost analysis aggregated report ( #13498 )
2023-10-10 07:04:40 +02:00
Pere Miquel Brull
d3da2d1b9f
Register Ingestion pipelines just from YAML ( #13501 )
...
* Register Ingestion pipelines just from YAML
* Format
2023-10-10 07:04:04 +02:00
Pere Miquel Brull
f6a87ee02a
Fix #12082 - Bump PyAthena version ( #13464 )
2023-10-09 20:47:19 +02:00
Pere Miquel Brull
d31db4e862
metadata CLI accepts tilde for relative paths ( #13487 )
...
* metadata CLI accepts tilde for relative paths
* [Docs] - Extracting MWAA details
2023-10-09 09:45:50 +02:00
Pere Miquel Brull
f5e10c4a5f
Fix #7272 - BaseWorkflow docs and cleanup ( #13471 )
...
* DQ BaseWorkflow
* Test suite runner
* test Suite workflow
* Refactor DQ for BaseWorkflow
* Lint
* Fix source
* Fix source
* Fix source
* Fix source
* Fix test
* Prepare docs
* Clean sink
* Clean legacy classes
* typo
* ProcessorStatus
2023-10-09 07:05:05 +02:00
Ayush Shah
08d7ee6d55
Fixes #13052 : Datalake Nested Columns Sample Data ingestion ( #13338 )
2023-10-08 20:08:51 +05:30
Mayur Singal
ec94eb0113
Fix #12952 : Fix nifi json decode error ( #13465 )
2023-10-07 15:59:29 -07:00
Pere Miquel Brull
aed9e3875f
DQ base workflow ( #13454 )
...
* DQ BaseWorkflow
* Test suite runner
* test Suite workflow
* Refactor DQ for BaseWorkflow
* Lint
* Fix source
* Fix source
* Fix source
* Fix source
* Fix test
* Fix test
* Fix test
2023-10-06 18:29:18 +02:00
Mayur Singal
c0ababd8ad
Fix #13336 : Clean Mark All Deleted Table Flag ( #13344 )
2023-10-06 16:04:54 +05:30
Onkar Ravgan
3b7f023bdc
ca DI processor update ( #13453 )
2023-10-06 14:35:23 +05:30
Mayur Singal
2986d616b7
Fix superset owner issue for db ( #13451 )
2023-10-06 12:31:46 +05:30
Nguyen Huu Loc
7ff6738527
Fix looker missing git execution on container image ( #13457 )
...
* - Add git execution to ingestion Dockerfile
- [Looker] Update missing function
* Fix pylint
* Add git execution to Dockerfile
* Remove log
---------
Co-authored-by: Loc Nguyen <loc.nguyenhuu@xendit.co>
2023-10-06 06:51:07 +02:00
Onkar Ravgan
44df02010a
Added delete API for Raw Cost Analysis Report Rows ( #13435 )
...
* Added delete API
* review comments
* fixed checkstyle
* fixed naming
* checkstyle
---------
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 14:27:23 +02:00
Teddy
f0ab4c942d
Fixes #13267 - Remove maxLen and minLen from profiler default metrics ( #13447 )
...
* fix: change log level to debug
* remove minLength and maxLength from default metrics
2023-10-05 14:11:51 +02:00
Onkar Ravgan
1e48d2ecff
Added sd changes ( #13446 )
...
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 12:24:32 +02:00
Mayur Singal
0090286924
Fix Bigquery Test connection for multiproject ( #13380 )
...
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 14:50:42 +05:30
Mayur Singal
f879656f0a
Fix #12047 : Clean commonregex package from setup ( #13439 )
2023-10-05 13:41:31 +05:30
Teddy
c4a3de6a85
fix: handle tableConfig for profiler CLI ( #13437 )
...
* fix: handle tableConfig for profiler CLI
* fix: empty commit for CI
2023-10-05 10:02:57 +02:00
Teddy
ddae3d8143
Refactor Data Insight aggregators Classes ( #13433 )
...
* fix: removed legacy OS and ES aggregator classes
* fix: centralized aggregator business logic
* fix: implemented client specific aggregator
* fix: updated client instantiation to use client specific aggregator
* fix: clean up json schema
* fix: updated DI index names
* fix: added searchIndex + storedProcedure
* fix: ran linting
* fix: updated python test to include new entity types
2023-10-05 09:31:27 +02:00
Nguyen Huu Loc
ef1974edd6
Support LookML multi repos ( #13140 )
...
* Draft: Support LookML multi repos
* [Looker] manually create Dashboard datamodel
* [Looker] Support remote import & lineage for looker view
* Rollback parser.py
* refactor code
* Update code
* Remove logs & add comments
* Remove Middle & Nothing
* - Fix yield datamodel error
- Remove logs
* Support clone repo from Bitbucket
* Fix typo
* Optimize imports
* Fix pylint
---------
Co-authored-by: Loc Nguyen <loc.nguyenhuu@xendit.co>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-10-04 15:16:21 +02:00
Ayush Shah
97f4f8fbf3
Fixes 12922: Trino NaN issue + TrinoUserError ( #13244 )
...
* Fix Trino NaN issue + TrinoUserError
2023-10-04 18:39:39 +05:30
RyoAriyama
b2ee1a54ef
fix return type of docstring powerbi ( #13422 )
2023-10-04 15:00:06 +02:00
Anatoliy Shulika
b788061157
fixes #12771 : Added Greenplum Ingestion Connector ( #13128 )
...
* ISSUE-12771: Added Greenplum Ingestion Connector
* fixed python code formating
---------
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2023-10-04 14:53:53 +02:00
mitchellmann
0ba02a5977
Fixes 13249: Added ingestion support - Presto tbl/col comments ( #13250 )
...
* Added ingestion support - Presto tbl/col comments
* now supports scenario of NONE schema
* PY style fixes
---------
Co-authored-by: Mitchell Mann <mitchell.mann@tideworks.com>
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-04 14:41:12 +02:00
Mayur Singal
3b640b43b7
Fix column lineage nonetype error ( #13432 )
2023-10-04 17:48:56 +05:30
Pere Miquel Brull
0282574bdd
Create ometa client once and pass it around & improve pycln config ( #13310 )
...
* Create ometa client once and pass it around & improve pycln config
* Fix
* Fix
* Fix tests
* Fix maven ci
* Fix tests
* Fix tests
* Fix tests
* Format
* Fix DI
2023-10-04 09:14:03 +02:00
Pere Miquel Brull
31b827585b
Allow ometa to create services without storing the connection ( #13400 )
...
* Allow ometa to create services without storing the connection
* Allow ometa to create services without storing the connection
* Fix backend tests with null connection
2023-10-04 07:48:49 +02:00
Mayur Singal
4f4d1c725c
Fix failing E2Es ( #13419 )
2023-10-04 10:56:34 +05:30
Teddy
9ef3ff7a58
Cost analysis agg ( #13408 )
...
* feat: updated DI workflow to inherit from BaseWorkflow + split processor and producer classes
* feat: __init__.py files creation
* feat: updated workflow import classes in code and doc
* feat: moved kpi runner from runner to processor folder
* fix: skip failure on list entities
* feat: deleted unused files
* feat: updated status reporter
* feat: ran linting
* feat: fix test error with typing and fqn
* feat: updated test dependencies
* feat: ran linting
* feat: move execution order up
* feat: updated cost analysis report to align with new workflow
* feat: fix entity already exists for pipeline entity status
* feat: ran python linting
* feat: move skip_on_failure to method
* feat: added unusedReport to DI
* feat: added aggregated unused report
* feat: ran linting
* feat: reverted compose file changes
---------
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
2023-10-03 09:27:18 +02:00
Onkar Ravgan
bc491be5ad
Fixed dbt optional files for local config ( #13242 )
2023-10-03 12:44:06 +05:30
Ayush Shah
462b2f9445
Fix Latest Pylint 3.0.0 issues ( #13413 )
...
* Fix Latest Pylint issues
* add compatible bound to pylint version
2023-10-03 07:43:09 +02:00
Pere Miquel Brull
b5596a4640
Batch PII tagging ( #13385 )
...
* Batch PII tagging
* Batch PII tagging
* Fix tests
* Fix tests
2023-10-02 14:44:41 +02:00
Pere Miquel Brull
d915254fac
Prepare Storage Connector for ADLS & Docs ( #13376 )
...
* Prepare Storage Connector for ADLS & Docs
* Format
* Fix test
2023-10-02 12:15:09 +02:00