755 Commits

Author SHA1 Message Date
Teddy
93d132de5c
fix: removed SampleDataConnection class from package (#12229) 2023-07-01 11:21:21 -07:00
Pere Miquel Brull
1ecf5607c7
Looker - Fix file extension and blob import (#12232)
* Fix file extension and blob import

* Fix file extension and blob import
2023-06-29 16:14:17 +02:00
Mayur Singal
b44d4f1e5e
Fix SQLLineage Test (#12152) 2023-06-26 17:09:49 +05:30
Mayur Singal
a3fd6e9522
Fix #11659: Add support for filter patterns in dbt workflow (#12063) 2023-06-26 11:30:35 +05:30
Mayur Singal
7d1b123efe
Fix atlas lineage custom db name issue (#12108)
* Fix atlas lineage custom db name issue

* Fix test
2023-06-25 19:16:00 -07:00
Pere Miquel Brull
97e08ee25c
Fix #12106 - Fix looker chart sourceUrl & Array datatype handling (#12113)
* Fix looker chart url

* Handle array datatype

* Handle array datatype
2023-06-25 08:18:36 -07:00
Onkar Ravgan
d9d3f6895b
Fix 6874: Added Support for lineage from dbt ephemeral nodes (#12101)
* fixed dbt ephemeral nodes lin

* fixed dbt tests
2023-06-23 10:01:22 +02:00
Ayush Shah
cb6e42941a
Fix 12025: Clickhouse NaN issue (#12079) 2023-06-22 12:51:56 +05:30
Onkar Ravgan
5197682921
Fixed dagster bugs and Added Pydantic Models (#12048) 2023-06-22 10:59:09 +05:30
Teddy
1e86b6533c
Fixes #11743 - Remove SQLParse dependency for System Metrics (#12072)
* fix: removed sqlparse dependency for system metrics

* fix: update sample query

* fix: move system test os retrieval to `.get()`

* fix: move os.environ to `get`
2023-06-22 06:51:24 +02:00
Mayur Singal
c3cec54be9
Fix for #11807 Part 2: Add SourceUrl for table entity (#12013)
* Fix #11807 Part 2: Add SourceUrl for table entity

* address review comments: centralize sourceurl

* remove qlick

* pytest fix

* fix typo
2023-06-20 11:46:45 +02:00
Ayush Shah
83e9b6c310
Fixes 10395: Validation of yaml workflow configs (#11985) 2023-06-20 11:20:59 +05:30
Onkar Ravgan
f07c421264
Removed Empty Description Assignment to entities and added database name logic to tableau (#12031) 2023-06-19 19:19:42 +05:30
Teddy
76f5d3d571
Fixes #11994 - Update dbt and GE integration with new DQ flow (#12018)
* feat: updated GE integration to match new test workflow

* feat: updated unit tests to match new signature

* feat: added GE integration tests

* feat: ran python linting

* feat: updated dbt ingestion to match new TestSuite workflow

* feat: ran python linting

* feat: remove testSuite from Elasticsearch event test case update

* feat: ran java linting
2023-06-19 15:05:51 +02:00
Ayush Shah
f80eaf3a26
Fixes 11068: mysql & postgres iam auth (#11937) 2023-06-16 13:18:12 +05:30
Onkar Ravgan
d08c928801
Added project property to dashboards (#11986)
* Added projects to dashboards

* Added powerbi proj

* merge conflicts after source url

* fixed mongo pytest
2023-06-15 21:23:43 +05:30
Mayur Singal
82a0222257
SourceUrl changes for dashboard, pipeline & chart entities (#11991) 2023-06-15 14:44:48 +05:30
Mayur Singal
7fa963eec3
Fix #1076: Add mongodb support (#11943) 2023-06-15 11:14:22 +05:30
07Himank
62af9bb633
fixed issue for lineage description (#11500)
* fixed issue for lineage description

* fixed issue while ingesting

* fixed issue while ingesting

* added test case for Lingeage with description

* addressing comments .. enhancement

* addressing comments .. enhancement

* modified py test case and removed description from addLineage as we are not using it.

* add support for topic entity and description in lineage details

* fix pylint & test

* pytest fix

* fix column lineage null issue

---------

Co-authored-by: Himank Mehta <himankmehta@Himanks-MacBook-Air.local>
Co-authored-by: ulixius9 <mayursingal9@gmail.com>
Co-authored-by: Mayur Singal <39544459+ulixius9@users.noreply.github.com>
2023-06-12 11:17:32 +05:30
Mayur Singal
05dc42bdb8
Fix #11808: Handle lineage for single db sources in superset (#11933) 2023-06-09 12:43:06 +05:30
Onkar Ravgan
caabe89f9c
Centralize tags ingestion logic (#11880) 2023-06-09 10:45:53 +05:30
Teddy
4b9f213dbf
Fixes Issue #11863 - Add Status to DQ (#11893)
* feat: added entityReference field in testSuite to link testSuite to an entity when the testSuite is executable.

* feat: added `executableEntityReference` as an entity reference for executable test suite to their entity

* feat: add status object to test case results

* feat: ran python linting

* feat: fixed  update to
2023-06-06 10:09:16 +00:00
Teddy
721869428e
Revert "Fixe Issue #11863 - Add Status logic for test case results (#11881)" (#11892)
This reverts commit 06735fe8dbaac5b267c9a2cf744ca154f88a9247.
2023-06-06 09:56:12 +02:00
Teddy
06735fe8db
Fixe Issue #11863 - Add Status logic for test case results (#11881)
* feat: added entityReference field in testSuite to link testSuite to an entity when the testSuite is executable.

* feat: added `executableEntityReference` as an entity reference for executable test suite to their entity

* feat: add status object to test case results

* feat: ran python linting
2023-06-06 09:45:49 +02:00
Ayush Shah
65f370e4aa
Rename GCS to GCP (#11812) 2023-06-06 11:57:00 +05:30
Teddy
d0cffdcd66
Fixes Issue #11438 - Implement threshold and startegy for custom SQL (#11847)
* feat: Add threshold and strategy logic on the custom SQL object test

* feat: ran python linting

* feat: added safety checks for custom sql query

* feat: ran python linting
2023-06-02 09:41:31 +02:00
Teddy
c98a15ca19
Fixes #11705 - Update ingestion and backend to match new DQ flow (#11836)
* feat: refactor ingestion flow logic

* feat: ran python linting

* feat: update tests to match new workflow

* feat: ran python linting

* feat: update sample data test suite name

* feat: Added backend logic to support logical and executable test suites

* feat: clean up java and json code

* feat: added sample data for logical and executable test suites

* feat: remove executable from CreateTestSuite

* feat: ran python and java linting

* feat: added README info for data quality structure

* skipping cypress to keep main green

* fixed typescript type issue

---------

Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
2023-06-01 23:19:13 -07:00
Pere Miquel Brull
fdeea71671
Fix Looker explore git link & Add BitBucket reader (#11837)
* Add looker test connection step

* Add looker test connection step

* Update Credentials

* Fix explore link and add bitbucket reader

* Format

* Fix test

* Fix spline linting

* Fix import
2023-06-02 07:19:32 +02:00
Mayur Singal
b57bbf833f
Fix #11572: Glue Support Partition Columns & Use Pydantic Models (#11776) 2023-05-31 12:03:34 +00:00
Chirag Madlani
7adc291364
fix(ui): circular deps for entityReference.json (#11760)
* fix(ui): circular deps for entityReference.json

* Fix circular Dependency python

* Cap Delta Spark version

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-05-26 18:02:21 +05:30
Sriharsha Chintalapani
6509a3670a
Fix #11664: Refactor patch_mixin to use jsonpatch lib (#11696)
* Fix #11664: Refactor patch_mixin to use jsonpatch lib

* Migrate to jsonpatch

* Fix nested cols

* Format

* Update patch_description

* Table constraints

* tag

* owner

* column tag

* column desc

* Format

* Format

* Fix log

* Update dbt patch

* Update column fqn

* Fix test

* Fix tests

---------

Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com>
2023-05-23 15:47:11 +02:00
Teddy
8c50d1af52
Fixes #4565 - Fetch Metrics from System tables (#11645)
* feat: fetch metrics from system tables

* feat: add permission doc for fetching metrics from system tables

* feat: fix E2E tests to reflect full table row count after table metric update

* feat: ran linting

* feat: fix doc string engine name + function typing

* feat: ran python linting
2023-05-22 09:04:18 +02:00
Teddy
ddbc7fe14d
Fixes #11570 - Add support for BQ Multi-project Profiler (#11692)
* fix: extracted profiler object from workflow and implemented factory to allow service base logic

* fix: ran python linting

* fix: renamed `base` to `base_profiler_source`

* fix: add logic to set correct database for BQ multi project ID connections

* fix: ran python linting
2023-05-20 14:22:53 -07:00
Pere Miquel Brull
0eb2201f94
Restructure NER Scanner internals (#11690)
* Simplify col name scanner

* Restructure NER Scanner internals
2023-05-19 18:21:01 +02:00
Ayush Shah
ad7258e7be
Fixes 10949: return Chunks for file formats & Centralize logic for different auth configs (#11639)
* Centralize Auth and File formats datalake
2023-05-19 18:54:28 +05:30
Mayur Singal
e9992a52a8
Fix #1604: Add Spline Pipeline Connector (#11562)
* Fix #1604: Add Spline Connector

* Add tests & grammer validation

* Spline UI Changes & Docs

* fix pipeline workflow doc

* chore: use common field for dbService name

* chore: use const for beta services

* chore: add service icon

* Update ingestion/src/metadata/ingestion/source/pipeline/spline/metadata.py

Co-authored-by: Onkar Ravgan <onkar.10r@gmail.com>

---------

Co-authored-by: Sachin Chaurasiya <sachinchaurasiyachotey87@gmail.com>
Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>
Co-authored-by: Onkar Ravgan <onkar.10r@gmail.com>
2023-05-19 14:46:32 +05:30
Pere Miquel Brull
50ad38ea0f
Fix #11548 - Secrets Managers comms with OMeta (#11602)
* Remove secretsManagerCredentials from backend

* Remove secretsManagerCredentials from backend

* Add secrets manager loader

* Load SM in the ometa client

* Fix tests

* Fix tests

* Fix Lint

* Mock AWS region

---------

Co-authored-by: Ayush Shah <ayush@getcollate.io>
2023-05-19 09:43:11 +02:00
Pere Miquel Brull
4626363fd8
Fix parsing for Storage (#11663) 2023-05-19 09:36:44 +02:00
Pere Miquel Brull
8795337f88
Clean NER Scanner imports (#11653) 2023-05-18 12:53:22 +02:00
Mayur Singal
e4997c3749
Fix #11571: Support custom database name for glue (#11631) 2023-05-18 14:16:56 +05:30
Pere Miquel Brull
1b90badd0e
Restructure PII processor (#11640)
* Restructure PII processor

* Restructure PII processor

* Format
2023-05-17 15:58:17 +02:00
Onkar Ravgan
3d9d4416b7
Fixed incompatible column name for Postgres version 11.6 (#11536)
* postgres col name on version

* Added dependancy

* Added paranthesis validation

* review comments and tests
2023-05-15 11:48:03 +05:30
Onkar Ravgan
cff403a05a
Validate if tags are created before attaching them to CreateRequest (#11554)
* Added tags validation

* typo fixed
2023-05-11 16:04:55 +00:00
Teddy
60de33d7cf
Fixes #11384 - Implement mem. optimization for sys. metrics (#11460)
* fix: optimize system metrics retrieval for memory

* fix: ran python linting

* fix: logic to retrieve unique system metrics operations

* fix: added logic to clean up query before parsing it

* fix: added E2E tests for rds, bq, snflk system metrics

* fix: ran python linting

* fix: fix postgres query + add default byte size to env var

* fix: ran python linting
2023-05-09 12:05:35 +02:00
Keith Sirmons
65c5b44eaa
Impala Connection Profiler is_nan rollback; Histogram fix. (#11388) 2023-05-05 21:45:30 +02:00
Teddy
f8c667b504
Fix median for concatenable types (#11382)
* fix: median/fq/tq for concatenable types

* fix: ran linting
2023-05-02 10:45:26 +00:00
Keith Sirmons
ad9b5a0cb5
Impalaconnection 0.2.1 + string datatypes enabled in profile (#11364)
* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.

* added the ordinalPosition data point into the Column constructor.

* renamed variable to better describe its usage.

* updated profile errors.
Hive connections now comment columns by default.

* removed print statements

* Cleaned up code by pulling check into its own function

* Updated median function to return null when it is being used for first and third quartiles.

* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.

* added the ordinalPosition data point into the Column constructor.

* renamed variable to better describe its usage.

* updated profile errors.
Hive connections now comment columns by default.

* removed print statements

* Cleaned up code by pulling check into its own function

* Updated median function to return null when it is being used for first and third quartiles.

* removed print statements and ran make py_format

* updated to fix some pylint errors.
imported Dialects to remove string compare to "impala" engine

* moved huge comment into function docstring.
This comment shows us the sql to get quartiles in Impala

* added cast to decimal for column when running average in mean.py

* fixed lint error

* fixed ui ordering of precision and scale.
Precision should be ordred in front of scale since the precision is set first in decimal data types

* Fixed overflow error when converting large numbers to bigint

Fixed error for CHAR datatype missing.

* Fixed NaN issues with Impala Profile

* py formatting

* Fixed warnings from SqlAlchemy
  The GenericFunction 'max' is already registered and is going to be overridden.
  The GenericFunction 'min' is already registered and is going to be overridden.

Updated Min/Max to handle strings by getting they length.

* Updated profiler to handle strings by using the string length as the parameter to compute the profile

* py_format updates

* fix: ran linting

* fix: Mysql hardcoded table alias

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
2023-04-30 10:03:56 +02:00
Pere Miquel Brull
d3d523e96d
Ingestion md docs review (#11219)
* Update workflow docs

* Remove duplicate key

* Update Custom connector docs

* Update Domo connector docs

* Dashboard docs updates

* Some databases docs updates

* Finish db docs updates

* Remove Pulsar

* Messaging docs

* Metadata docs

* ML docs

* S3 docs

* Fix rendering

* Update title and description of the databaseSchema

* Pipeline Service docs

* remove pulsar from tests

* Format

* Fix test

* Remove pulsar

* Remove pulsar
2023-04-23 18:43:46 +02:00
Mayur Singal
da2f03ca50
Salesforce docs & remove unnecessary fields (#11207) 2023-04-22 18:32:32 +02:00
Nahuel
ed1388827e
Doc: Add ElasticsearchReindex and Data Insight docs in UI (#11201) 2023-04-21 11:34:55 -07:00