* Initial plan
* Add support for classification tags in dbt meta field
- Update DbtMetaGlossaryTier model to include tags field
- Add processing logic in process_dbt_meta for classification tags
- Add unit test for classification tags functionality
- Support tag FQNs in format 'classification.tag'
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
* Add comprehensive test for combined dbt meta tags
- Add test_dbt_combined_meta_tags to verify all tag types work together
- Test glossary terms, tier, and classification tags in one meta field
- Verify all three types of tags are processed correctly
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
* Add edge case tests for classification tags
- Test empty tags list returns empty result
- Test invalid tag format (missing classification) is skipped
- Test None tags is handled gracefully
- Ensures robust error handling
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
* Fix pycheck
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harshach <38649+harshach@users.noreply.github.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Co-authored-by: Keshav Mohta <68001229+keshavmohta09@users.noreply.github.com>
* Brief documentation of installation requirements
* Minor fix to run tests only defined in OpenMetadata
* Add full example to Data Quality as Code
* Install `griffe2md` and fix docstrings
* Remove local openmetadata reference
* Fix writing, grammar and typos
* Fix test
* Fix formatting
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex
* Implement NotNull and MissingCount dimensionality
* Implement columnValuesToBeBetween dimensionality
* Fix test
* Implement Pandas Dimensionality for ColumnValueStdDevToBeBetween
* Implement Dimensionality for ColumnValuesStdDevToBeBetween
* Implement dimensionality for column values to be at expected location
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex
* Implement NotNull and MissingCount dimensionality
* Implement columnValuesToBeBetween dimensionality
* Fix test
* Implement Pandas Dimensionality for ColumnValueStdDevToBeBetween
* Implement Dimensionality for ColumnValuesStdDevToBeBetween
* Fixed tests due to sqlite now supporting stddev
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex
* Implement NotNull and MissingCount dimensionality
* Implement columnValuesToBeBetween dimensionality
* Fix test
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex
* Implement NotNull and MissingCount dimensionality
* Fix test
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
* Implement Dimensionality for ColumnValuestoMatchRegex and ColumnValuesToNotMatchRegex
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Implement dimensionality for Column Values not In Set
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
* Implement Dimensionality SumToBeBetween
* Update columnValueLengthsToBeBetween.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
* Implement Dimensionality for ColumnMedianToBeBetween in Pandas
* Implement Median Dimensionality for SQL
* Add database tests
* Fix median metric
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fixed fqn parsing problem in clickhouse and added more logging
* ran py format commands
* fixed python formatting issues
---------
Co-authored-by: Nancy Amandi <nancy.amandi@moniepoint.com>
Co-authored-by: Teddy <teddy.crepineau@gmail.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
* Implement Dimensionality to ColumnLengthToBeBetween
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Refactor previous tests for shared resources
* Add validation result models
This also includes a method for merging them, useful when running validation in batches
* Added `DataFrameValidationEngine` for running tests
This also includes a registry for mapping test names to pandas test classes
* Implement the DataFrameValidator facade
This includes the logic to load tests from different sources (OpenMetadata or code) and pass them down to the engine.
It also includes tests for the integration with OpenMetadata
* Add examples for the API
* Apply comments
* Implement Ingestion side to return a flag when all values are unique
* Update generated TypeScript types
* feat: Enhance CardinalityDistributionChart to display messages when all values are unique
- Added logic to check if all values are unique for both first day and current day data.
- Implemented a placeholder message when all values are unique, indicating no distribution available.
- Updated tests to cover scenarios for unique values and ensure correct rendering of charts and messages.
- Added localization for the new message in multiple languages.
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>
* refactor: used hashing to reduce api calls, replace distinct with group by to optimize lineage queries & minor code optimizations
* Update generated TypeScript types
* fix: self.job_table_lineage defaultdict function
* refactor: improved hashing
* fix: added _table_lookup_cache and _dlt_table_cache in tests
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Refactor to reduce code repetition and complexity
* Fix conflict
* Rename method
* Refactor some metrics
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
* Fix Unittests
* Fix Issue with counting total rows on mean
* Improve code
* Fix Merge
* Removed unused type
* Fix Tests
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Initial implementation for Dimensionality on Data Quality Tests
* Fix ColumnValuesToBeUnique and create TestCaseResult API
* Refactor dimension result
* Initial E2E Implementation without Impact Score
* Dimensionality Thin Slice
* Update generated TypeScript types
* Update generated TypeScript types
* Removed useless method to use the one we already had
* Fix Pandas Dimensionality checks
* Remove useless comments
* Implement PR comments, fix Tests
* Improve the code a bit
* Fix imports
* Implement Dimensionality for ColumnMeanToBeBetween
* Removed useless comments and improved minor things
* Implement UnitTests
* Fixes
* Moved import pandas to type checking
* Fix Min/Max being optional
* Fix Unittests
* small fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>