12 Commits

Author SHA1 Message Date
Ayush Shah
246bf15476
Add Clickhouse profiler fix (#12531) 2023-07-21 10:19:56 +05:30
mgorsk1
cd347299d7
Fixes 12439: Remove braces from approx_percentile method (Trino/Presto) in profiler (#12440)
* 🐛 Remove braces from approx_percentile method (Trino/Presto)
* 👌 Updating code due to code review changes.
2023-07-20 20:20:24 +05:30
Ayush Shah
cb6e42941a
Fix 12025: Clickhouse NaN issue (#12079) 2023-06-22 12:51:56 +05:30
Keith Sirmons
65c5b44eaa
Impala Connection Profiler is_nan rollback; Histogram fix. (#11388) 2023-05-05 21:45:30 +02:00
Teddy
f8c667b504
Fix median for concatenable types (#11382)
* fix: median/fq/tq for concatenable types

* fix: ran linting
2023-05-02 10:45:26 +00:00
Keith Sirmons
ad9b5a0cb5
Impalaconnection 0.2.1 + string datatypes enabled in profile (#11364)
* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.

* added the ordinalPosition data point into the Column constructor.

* renamed variable to better describe its usage.

* updated profile errors.
Hive connections now comment columns by default.

* removed print statements

* Cleaned up code by pulling check into its own function

* Updated median function to return null when it is being used for first and third quartiles.

* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.

* added the ordinalPosition data point into the Column constructor.

* renamed variable to better describe its usage.

* updated profile errors.
Hive connections now comment columns by default.

* removed print statements

* Cleaned up code by pulling check into its own function

* Updated median function to return null when it is being used for first and third quartiles.

* removed print statements and ran make py_format

* updated to fix some pylint errors.
imported Dialects to remove string compare to "impala" engine

* moved huge comment into function docstring.
This comment shows us the sql to get quartiles in Impala

* added cast to decimal for column when running average in mean.py

* fixed lint error

* fixed ui ordering of precision and scale.
Precision should be ordred in front of scale since the precision is set first in decimal data types

* Fixed overflow error when converting large numbers to bigint

Fixed error for CHAR datatype missing.

* Fixed NaN issues with Impala Profile

* py formatting

* Fixed warnings from SqlAlchemy
  The GenericFunction 'max' is already registered and is going to be overridden.
  The GenericFunction 'min' is already registered and is going to be overridden.

Updated Min/Max to handle strings by getting they length.

* Updated profiler to handle strings by using the string length as the parameter to compute the profile

* py_format updates

* fix: ran linting

* fix: Mysql hardcoded table alias

---------

Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
2023-04-30 10:03:56 +02:00
Schlameel
4c3f142a2c
Fixes #6340: Implement Median support for MySQL (#10962)
* ISSUE #6340: Implement Median support for MySQL
- Added code to existing function that previously returned None
- Important - Tested only external to OM.
- Performance tested - 1/7th the speed of other solutions. Not impacted by index.

* ISSUE #6340 - Implement median support for MySQL
Changed to remove setting user variable in expression per Teddy

* ISSUE #6340 - Implement median support for MySQL
Formatting
2023-04-12 08:07:36 +02:00
Keith Sirmons
42000053aa
Fixed Issue #10943: Impala query engine metadata ingestion and median function profiler (#10944)
* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.

* added the ordinalPosition data point into the Column constructor.

* renamed variable to better describe its usage.

* updated profile errors.
Hive connections now comment columns by default.

* removed print statements

* Cleaned up code by pulling check into its own function

* Updated median function to return null when it is being used for first and third quartiles.

* removed print statements and ran make py_format

* updated to fix some pylint errors.
imported Dialects to remove string compare to "impala" engine

* moved huge comment into function docstring.
This comment shows us the sql to get quartiles in Impala
2023-04-06 18:07:42 +02:00
Teddy
d03b06daf6
feat: Added logic to handle MERGE statement for bigquery (#10522) 2023-03-13 11:34:40 +01:00
Pere Miquel Brull
a05e56feba
Pyimpala fix colnames, comments and dialect sql compilation (#10470)
* Fix col names and comments for impala hive

* Fix cols, comments and impala sql compilation

* Handle hive types

* Format
2023-03-08 14:13:06 +01:00
Teddy
5208b6f684
Fixes #4368 - Add Histogram Metric (#10422) 2023-03-03 21:56:32 +01:00
Teddy
754074f1be
Fixes #7758 - Added Column value and Integer Range Partitionning (#10350)
* feat(profiler): renamed  module to

* feat(profiler): added dbt-artifacts-parser to test setup.py

* feat(profiler): refactor workflow and interface

* feat(profiler): linting

* feat(profiler): removed old profiler modules

* feat(profiler): added support for value and integer range partition

* feat(profiler): fixed linting

* feat(profiler): added partitionning support for datalake profiler

* feat(profiler): removed `ProfilerInterfaceArgs` class

* feat(profiler): address comments

* feat(profiler): Added `OTHER` as an `IntervalType` for UI type generation
2023-03-01 08:20:38 +01:00