* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.
* added the ordinalPosition data point into the Column constructor.
* renamed variable to better describe its usage.
* updated profile errors.
Hive connections now comment columns by default.
* removed print statements
* Cleaned up code by pulling check into its own function
* Updated median function to return null when it is being used for first and third quartiles.
* updated metadata to work with the impala query engine.
Uses the describe function to grab column names, data types, and comments.
* added the ordinalPosition data point into the Column constructor.
* renamed variable to better describe its usage.
* updated profile errors.
Hive connections now comment columns by default.
* removed print statements
* Cleaned up code by pulling check into its own function
* Updated median function to return null when it is being used for first and third quartiles.
* removed print statements and ran make py_format
* updated to fix some pylint errors.
imported Dialects to remove string compare to "impala" engine
* moved huge comment into function docstring.
This comment shows us the sql to get quartiles in Impala
* added cast to decimal for column when running average in mean.py
* fixed lint error
* fixed ui ordering of precision and scale.
Precision should be ordred in front of scale since the precision is set first in decimal data types
* Fixed overflow error when converting large numbers to bigint
Fixed error for CHAR datatype missing.
* Fixed NaN issues with Impala Profile
* py formatting
* Fixed warnings from SqlAlchemy
The GenericFunction 'max' is already registered and is going to be overridden.
The GenericFunction 'min' is already registered and is going to be overridden.
Updated Min/Max to handle strings by getting they length.
* Updated profiler to handle strings by using the string length as the parameter to compute the profile
* py_format updates
* fix: ran linting
* fix: Mysql hardcoded table alias
---------
Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>
Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>
* feat(profiler): renamed module to
* feat(profiler): added dbt-artifacts-parser to test setup.py
* feat(profiler): refactor workflow and interface
* feat(profiler): linting
* feat(profiler): removed old profiler modules
* feat(profiler): added support for value and integer range partition
* feat(profiler): fixed linting
* feat(profiler): added partitionning support for datalake profiler
* feat(profiler): removed `ProfilerInterfaceArgs` class
* feat(profiler): address comments
* feat(profiler): Added `OTHER` as an `IntervalType` for UI type generation
* refactor(profiler): integrated getter func.
Removed metric getter function from their own file.
Added metric getter to their own interface classs.
created dispatch by value methdo to dispatch metric getter func.
* feature(profiler): added systemProfiler schema
* feat(profiler): workflow fresh. & snflk impl.
* feat(profiler): freshness endpoint for put and get
* feat(profiler): added system met. for redshift
* feat(profiler): freshness met. for bigquery
* fix(profiler): keyword not found in func
* feat(profiler): Added sample data for freshness
* fix(profiler): fetch previous day for BQ
* fix(profiler): sonar + data fetching logic
* fix: typo in SystemMetric Class
* fix: linting
* fix: extracted out EntityList class into models.py
* Clean up test suite workflow and interface
* Fixed tests
* Split profiler and testSuite interfaces
* Cleaned up workflows and runners
* Fixed code formatting
* - remove old code
- remove `table` attribute used for testing and used mock instead
* Fixed execution bugs from refactor
* Fixed static type checking for profiler/api/workflow.py
* Fixed linting
* Added __init__ files
* Added database filter in workflow
* Removed association between profiler and data quality
* fixed tests with removed association
* Fixed sonar code smells and bugs
* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample
* moved status to workflow
* Fixed tests
* removed test logic from profiler sink
* Added logic to return sample from workflow sample value
* Added profiler examples
* Updated documentation for profiler
* Fixed code smells
* commited changed to profiler
* initial commit of the revamp workflow
* Fixed python formating
* cleaned up profiler submodule by removing test related files and functions
* Added airflow DAG logic for testSuite workflow
* Fixed code smells + added airflow ingestion tests + fixed comments
* Added database filter in workflow
* Removed association between profiler and data quality
* fixed tests with removed association
* Fixed sonar code smells and bugs
* Updated profiler workflow to:
- support only running profiler (removed test run)
- support column inclusion and exclusion
- added back support for partitioned table and sample
* moved status to workflow
* Fixed tests
* removed test logic from profiler sink
* Added logic to return sample from workflow sample value
* Added profiler examples
* Updated documentation for profiler
* Fixed code smells
* Added database filter in workflow
* Removed association between profiler and data quality
* fixed tests with removed association
* Fixed sonar code smells and bugs
* Added tests for multithreading SQA interface
* Added multithread support for metric computation
* Added thread ID to log debuger
* Cleaned up tests
* Fixed python formatting issues
* Added non blocking result processing + threadCount in config file to set numbers of threads
* Added frontend input field to set number of threads
* Fixed code smell, bug and comments from reviewer
* Added additional table + test coverage
* Added logic for front end input fields
* Added comment for median metric
* skipping `Update owner and check description` cypress test
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>