2826 Commits

Author SHA1 Message Date
Ayush Shah
57cb72c26f
Fix Checkstyle (#13683) 2023-10-23 15:51:40 +05:30
Iaroslav Frolikov
420da29841
Fixes #13607: BigQuery lineage ingestion fails when using GcpCredentialsPath authentication config (#13608) 2023-10-23 15:42:06 +05:30
Keagan O'Donoghue
74aef36b1e
ISSUE-13517: Added option to explicitly specify backup filename (#13661)
* ISSUE-13517: Added option to explicitly specify output filename for metadata backup

* format

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-10-22 13:26:12 +02:00
Onkar Ravgan
0f0bccdd45
Converted and fixed pipelinestatus timestamps to milliseconds (#13670)
* fixed pipelinestatus timestamps in mills

* Added migrations
2023-10-20 09:39:24 -07:00
Teddy
feb52647d2
fix: conditioned call to getColumnTags to columns fields (#13652) 2023-10-20 17:50:52 +02:00
Pere Miquel Brull
8cf8720a9d
Clean Airflow Lineage Backend and migrate status to millis (#13666)
* Clean Airflow Lineage Backend and migrate status to millis

* Format

* chore(ui): update executions startTs and endTs to millis

* Remove lineage providers

---------

Co-authored-by: Sachin Chaurasiya <sachinchaurasiyachotey87@gmail.com>
2023-10-20 15:42:38 +02:00
Pere Miquel Brull
660bf01a5b
Fix Stored Procedures Lineage for multi-db processes (#13655) 2023-10-20 09:14:08 +02:00
Teddy
fc335f2aff
fix: limit fields to only required ones (#13647) 2023-10-19 16:16:20 +02:00
Ayush Shah
ad86d8969f
Fix E2E failures (#13648) 2023-10-19 17:49:02 +05:30
Pere Miquel Brull
255bfb95b1
Remove duplicates from entity_extension_time_series and add the const… (#13626)
* Remove duplicates from entity_extension_time_series and add the constraing if missing

* Add sort buffer and work mem

* Revert "Add sort buffer and work mem"

This reverts commit fcfff5feb60c9212bb7c1cad34b524dc8c03bfc5.
2023-10-19 12:15:02 +02:00
Ayush Shah
f94e2dbb47
Fix Hive Bytes issue, add athena yaml, fix bigquerymultiple project id token issue (#13640) 2023-10-18 23:48:21 +05:30
Ayush Shah
ac9e8c9e89
Add E2E - Oracle, Athena. Remove Duplicated code (#13563) 2023-10-18 16:57:06 +05:30
Pere Miquel Brull
899cd7e1fe
Fix DQ Workflow (#13631)
* Fix DQ Workflow

* Fix DQ Workflow
2023-10-18 11:49:38 +02:00
Onkar Ravgan
d70cf2ea7a
Fixed status class pydantic model (#13627) 2023-10-18 12:21:39 +05:30
Sriharsha Chintalapani
e1900d4ec1
Fix #13555: Long column names considered repeated (#13620) 2023-10-17 10:29:22 -07:00
Onkar Ravgan
84a41a6fbf
fixed dm column names (#13615) 2023-10-17 09:01:00 -07:00
Onkar Ravgan
0307a59388
Added fixes (#13589) 2023-10-17 19:56:03 +05:30
Mayur Singal
6578383827
Fix incorrect ingestion pipeline duration (#13587) 2023-10-17 12:37:19 +05:30
Mayur Singal
67c74dc57d
Fix Nifi test connection (#13528) 2023-10-13 18:32:11 +05:30
Teddy
31d2595e4f
fix: pass rnd table bound columns to sample query (#13561) 2023-10-13 14:57:28 +05:30
07Himank
6ffe79f793
fixed ES Indexing for very large S3 Storage Service buckets fails (#13507) 2023-10-13 10:22:53 +05:30
Teddy
1cbdfb3ae7
Fixes #12601 - column filter for profiler workflow (#13535)
* fix: sample data ingestion to match entity profiler column setting

* fix: python linting

* fix: updated fn call

* fix: added logic to handle json filed in datalake connector

* fix: handle NA values in parsing

* fix: reverted sampler changes from #13338

* fix: reverted metric changes from #13338

* fix: added datalake profiler ingestion test

* fix: python linting

* fix: removed normalization of json blob in NoSQL db
2023-10-12 14:51:38 +02:00
Mayur Singal
f63881b8b6
Fix mysql E2E test count (#13529) 2023-10-12 11:25:14 +05:30
Onkar Ravgan
6e013246a7
dbt fixed null sql updates and source descriptions (#13467) 2023-10-12 11:07:58 +05:30
Teddy
e57849b732
Fixes #12298 - Update report data type to camel case (#13505)
* fix: updated DI to camelCase

* fix: ran linting

* fix: added migration

* fix: remove extra parenthesis in migration file

* fix: psql migration query

* fix: OS compose host

* fix: removed commented code block
2023-10-11 08:14:21 +02:00
Onkar Ravgan
115cd3506d
Enable pymssql python library (#13489)
* enabled dep

* review comments
2023-10-10 12:51:52 +02:00
Mayur Singal
f69cd9f54a
Fix hive e2e test count (#13497) 2023-10-10 00:21:23 -07:00
Teddy
eefce68015
fix: updated DI cost analysis aggregated report (#13498) 2023-10-10 07:04:40 +02:00
Pere Miquel Brull
d3da2d1b9f
Register Ingestion pipelines just from YAML (#13501)
* Register Ingestion pipelines just from YAML

* Format
2023-10-10 07:04:04 +02:00
Pere Miquel Brull
f6a87ee02a
Fix #12082 - Bump PyAthena version (#13464) 2023-10-09 20:47:19 +02:00
Pere Miquel Brull
d31db4e862
metadata CLI accepts tilde for relative paths (#13487)
* metadata CLI accepts tilde for relative paths

* [Docs] - Extracting MWAA details
2023-10-09 09:45:50 +02:00
Pere Miquel Brull
f5e10c4a5f
Fix #7272 - BaseWorkflow docs and cleanup (#13471)
* DQ BaseWorkflow

* Test suite runner

* test Suite workflow

* Refactor DQ for BaseWorkflow

* Lint

* Fix source

* Fix source

* Fix source

* Fix source

* Fix test

* Prepare docs

* Clean sink

* Clean legacy classes

* typo

* ProcessorStatus
2023-10-09 07:05:05 +02:00
Ayush Shah
08d7ee6d55
Fixes #13052: Datalake Nested Columns Sample Data ingestion (#13338) 2023-10-08 20:08:51 +05:30
Mayur Singal
ec94eb0113
Fix #12952: Fix nifi json decode error (#13465) 2023-10-07 15:59:29 -07:00
Pere Miquel Brull
aed9e3875f
DQ base workflow (#13454)
* DQ BaseWorkflow

* Test suite runner

* test Suite workflow

* Refactor DQ for BaseWorkflow

* Lint

* Fix source

* Fix source

* Fix source

* Fix source

* Fix test

* Fix test

* Fix test
2023-10-06 18:29:18 +02:00
Mayur Singal
c0ababd8ad
Fix #13336: Clean Mark All Deleted Table Flag (#13344) 2023-10-06 16:04:54 +05:30
Onkar Ravgan
3b7f023bdc
ca DI processor update (#13453) 2023-10-06 14:35:23 +05:30
Mayur Singal
2986d616b7
Fix superset owner issue for db (#13451) 2023-10-06 12:31:46 +05:30
Nguyen Huu Loc
7ff6738527
Fix looker missing git execution on container image (#13457)
* - Add git execution to ingestion Dockerfile
- [Looker] Update missing function

* Fix pylint

* Add git execution to Dockerfile

* Remove log

---------

Co-authored-by: Loc Nguyen <loc.nguyenhuu@xendit.co>
2023-10-06 06:51:07 +02:00
Onkar Ravgan
44df02010a
Added delete API for Raw Cost Analysis Report Rows (#13435)
* Added delete API

* review comments

* fixed checkstyle

* fixed naming

* checkstyle

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 14:27:23 +02:00
Teddy
f0ab4c942d
Fixes #13267 - Remove maxLen and minLen from profiler default metrics (#13447)
* fix: change log level to debug

* remove minLength and maxLength from default metrics
2023-10-05 14:11:51 +02:00
Onkar Ravgan
1e48d2ecff
Added sd changes (#13446)
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 12:24:32 +02:00
Mayur Singal
0090286924
Fix Bigquery Test connection for multiproject (#13380)
Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-10-05 14:50:42 +05:30
Mayur Singal
f879656f0a
Fix #12047: Clean commonregex package from setup (#13439) 2023-10-05 13:41:31 +05:30
Teddy
c4a3de6a85
fix: handle tableConfig for profiler CLI (#13437)
* fix: handle tableConfig for profiler CLI

* fix: empty commit for CI
2023-10-05 10:02:57 +02:00
Teddy
ddae3d8143
Refactor Data Insight aggregators Classes (#13433)
* fix: removed legacy OS and ES aggregator classes

* fix: centralized aggregator business logic

* fix: implemented client specific aggregator

* fix: updated client instantiation to use client specific aggregator

* fix: clean up json schema

* fix: updated DI index names

* fix: added searchIndex + storedProcedure

* fix: ran linting

* fix: updated python test to include new entity types
2023-10-05 09:31:27 +02:00
Nguyen Huu Loc
ef1974edd6
Support LookML multi repos (#13140)
* Draft: Support LookML multi repos

* [Looker] manually create Dashboard datamodel

* [Looker] Support remote import & lineage for looker view

* Rollback parser.py

* refactor code

* Update code

* Remove logs & add comments

* Remove Middle & Nothing

* - Fix yield datamodel error
- Remove logs

* Support clone repo from Bitbucket

* Fix typo

* Optimize imports

* Fix pylint

---------

Co-authored-by: Loc Nguyen <loc.nguyenhuu@xendit.co>
Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-10-04 15:16:21 +02:00
Ayush Shah
97f4f8fbf3
Fixes 12922: Trino NaN issue + TrinoUserError (#13244)
* Fix Trino NaN issue + TrinoUserError
2023-10-04 18:39:39 +05:30
RyoAriyama
b2ee1a54ef
fix return type of docstring powerbi (#13422) 2023-10-04 15:00:06 +02:00
Anatoliy Shulika
b788061157
fixes #12771: Added Greenplum Ingestion Connector (#13128)
* ISSUE-12771: Added Greenplum Ingestion Connector

* fixed python code formating

---------

Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2023-10-04 14:53:53 +02:00