OpenMetadata

mirror of https://github.com/open-metadata/OpenMetadata.git synced 2025-07-23 09:22:18 +00:00

Author	SHA1	Message	Date
Pere Menal-Ferrer	44e09e41a2	Revert "FIX #1464 (#21520 )" (#21726 ) This reverts commit 1e86f9870fd663122b9bbb64f3cf17cf32619c7f.	2025-06-13 17:27:32 +02:00
Pere Menal-Ferrer	1e86f9870f	FIX #1464 (#21520 ) * Add PIICategoryTags and some utilities on top of them. * Fix static-check * Add test for fqn representation * Add NEREntityGeneralTags.json from Collate * Add test to check PIICategoryTags agree with the ones used by OM server * Add LabelExtractor * Fix style * Add ignore superflous-parens for pylint * Ass comment as per PR review * Fix not-updated PII-IT * Remove duplicated IT test for PII --------- Co-authored-by: Pere Menal <pere.menal@getcollate.io> Co-authored-by: Sriharsha Chintalapani <harshach@users.noreply.github.com>	2025-06-09 16:05:35 -07:00
Pere Menal-Ferrer	5d2dfa712a	feature/pii-processor-improvement (#21248 ) * Add PII Tag and Sensitivity Level enums. * Add feature-extraction for PII classification tasks * Add faker as test dependency * Add unit tests for presidio tag extractor * Add PIISensitivityTags enum and update sensitivity mapping logic * Add Presidio utility functions for PII analysis * Extend column name regexs for PII * Add tests for PAN, NIF, SSN entities * Fix version of faker to prevent flaky tests. Fix failing tests. * Add Generated to State enum * Integrate PIISensitive classifier to PIIProcessor	2025-05-19 17:52:17 +00:00
Pere Miquel Brull	3186937cc2	MINOR - Update Auto Classification defaults for sample data & classif… (#20587 ) * MINOR - Update Auto Classification defaults for sample data & classification * fix tests	2025-04-07 15:56:57 +02:00
Mayur Singal	7760663b22	MINOR: Change ingestion licence header (#20549 )	2025-04-03 10:39:47 +05:30
Teddy	28bd01c471	MINOR: Remove default 100 when `profileSample` is None (#19672 ) * fix: remove default 100% percent * fix: use get_dataset * fix: orm_profiler tests	2025-02-05 19:14:31 +01:00
Teddy	58699063db	MINOR -- Fix DQ Partition Issue (#18641 ) * fix: renamed `random_sample` to `get_dataset` and change dunder method access for SQA Table object * fix: removed handle_partition decorator * fix: fixed DQ partition issue + moved to `tablesample` method * style: ran python linting * style: fix python format check issues * feat: added postgres tablesample * style: ran python linting * fix: sampling delta * fix: merge conflicts * fix: resolved conflicts * style: ran python linting * fix: patch orm call in test case * fix: mock build_table_orm call in tests * fix: test case failures and errors * fix: removed unused import * fix: patch typo * fix: trino table schema retrieval * fix: remove tuple context manager for 3.8 test support	2024-11-27 08:50:54 +01:00
Pere Miquel Brull	c68a45e7d8	Create new Auto Classification Workflow (#18610 )	2024-11-19 08:10:45 +01:00
Pere Miquel Brull	4cccaae446	GEN-996 - Allow PII Processor without storing Sample Data (#17927 ) * GEN-996 - Allow PII Processor without storing Sample Data * fix import * fix import	2024-09-20 16:05:29 +02:00
Imri Paran	a3d6c1dd20	MINOR: tests(datalake): use minio (#17805 ) * tests(datalake): use minio 1. use minio instead of moto for mimicking s3 behavior. 2. removed moto dependency as it is not compatible with aiobotocore (https://github.com/getmoto/moto/issues/7070#issuecomment-1828484982) * - moved test_datalake_profiler_e2e.py to datalake/test_profiler - use minio instead of moto * fixed tests * fixed tests * removed default name for minio container	2024-09-12 07:13:01 +02:00
IceS2	c522f14178	MINOR: Refactor output_handlers to a WorkflowOutputHandler class (#17149 ) * Refactor output_handlers to a WorkflowOutputHandler class * Add old methods as deprecated to avoid breaking changes * Extract WorkflowInitErrorHandler from workflow_output_handler * Fix static checks * Fix tests * Fix tests * Update code based on comments from PR * Update comment	2024-07-29 09:20:34 +02:00
Pere Miquel Brull	cb72a22b59	Fix - e2e tests for pydantic V2 (#16551 ) * Fix - e2e tests for pydantic V2 * add correct default * add correct default * revert datetime aware * revert datetime aware * revert datetime aware * revert datetime aware * revert datetime aware * revert datetime aware * revert datetime aware * revert datetime aware * fix apis * format	2024-06-06 19:36:17 -07:00
Pere Miquel Brull	d8e2187980	#15243 - Pydantic V2 & Airflow 2.9 (#16480 ) * pydantic v2 * pydanticv2 * fix parser * fix annotated * fix model dumping * mysql ingestion * clean root models * clean root models * bump airflow * bump airflow * bump airflow * optionals * optionals * optionals * jdk * airflow migrate * fab provider * fab provider * fab provider * some more fixes * fixing tests and imports * model_dump and model_validate * model_dump and model_validate * model_dump and model_validate * union * pylint * pylint * integration tests * fix CostAnalysisReportData * integration tests * tests * missing defaults * missing defaults	2024-06-05 21:18:37 +02:00
juntao	8dd613caa5	Fixes #16235 : need quote fullyQualifiedName in Ingestion Framework (#16273 ) * Fixes #16235: need quote fullyQualifiedName in Ingestion Framework * MINOR: fix UT issue * revert: fix UT issue * revert code * revert code * format code	2024-05-23 17:45:47 +02:00
Pere Miquel Brull	b786064bc2	#11857 - Store workflow status in the Ingestion Pipeline Status (#14462 ) * Register StackTraceError in spec * Register StackTraceError in spec * Register StackTraceError in spec * Add todos * Update status * docs * format * Fix tests * Fix tests * Fix tests * Ignore generated * Fix tests * Fix tests * Tests * Try constants * Try constants * Print * Print * Print * order * Fix service name * fix ui error --------- Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>	2023-12-22 15:43:50 +01:00
Teddy	31d2595e4f	fix: pass rnd table bound columns to sample query (#13561 )	2023-10-13 14:57:28 +05:30
Teddy	1cbdfb3ae7	Fixes #12601 - column filter for profiler workflow (#13535 ) * fix: sample data ingestion to match entity profiler column setting * fix: python linting * fix: updated fn call * fix: added logic to handle json filed in datalake connector * fix: handle NA values in parsing * fix: reverted sampler changes from #13338 * fix: reverted metric changes from #13338 * fix: added datalake profiler ingestion test * fix: python linting * fix: removed normalization of json blob in NoSQL db	2023-10-12 14:51:38 +02:00
Pere Miquel Brull	0282574bdd	Create ometa client once and pass it around & improve pycln config (#13310 ) * Create ometa client once and pass it around & improve pycln config * Fix * Fix * Fix tests * Fix maven ci * Fix tests * Fix tests * Fix tests * Format * Fix DI	2023-10-04 09:14:03 +02:00
Pere Miquel Brull	b5596a4640	Batch PII tagging (#13385 ) * Batch PII tagging * Batch PII tagging * Fix tests * Fix tests	2023-10-02 14:44:41 +02:00
Pere Miquel Brull	de7e06d024	Update structure for PII processing (#13079 ) * Update structure for PII processing * Fix tests * Fix tests * Lint * Remove typo	2023-09-06 11:30:46 +02:00
Pere Miquel Brull	a3bfd4e696	Part of #11968 - Restructure Profiler Workflow and PII Processor (#13059 ) * Structure PII * Restructure Profiler Workflow * Update signature for abc * remove profiler sink * Fix tests * Fix lint * Fix test * Fix test	2023-09-04 11:02:57 +02:00
Pere Miquel Brull	6c0e9f5061	Part of #7272 - Centralize Workflows, Status, and Exception Management (#13029 ) * Prep changes * Prep changes * prep changes * Update imports * Format * Prep delete * Prep delete * Fix sink * Prep test * Commit * passing either * passing either * Prep Either * Metadata source with Either * Update status * Merge remote-tracking branch 'upstream/main' into issue-7272 * Format * Linting * Linting * Linting * Linting * Fix tests * Fix tests * Fix tests * Fix tests * Fix tests * Fix tests * Fix tests * Comments	2023-08-30 15:49:42 +02:00
Teddy	101cd0ebac	Issue 8930 - Update profiler timestamp from seconds to milliseconds (#12948 )	2023-08-25 08:47:16 +02:00
Suresh Srinivas	28b5e00c0c	Clean up documentation typos and grammar issues (#12930 )	2023-08-20 20:08:30 -07:00
Teddy	bfa0cc7598	fix: python tests failure after PR #12865 (#12927 ) * fix: python tests failure after https://github.com/open-metadata/OpenMetadata/pull/12865 * fix: test in ometa_table_api * fix: skip is None test temporarly	2023-08-18 18:11:47 +02:00
Ayush Shah	ab1ec50c2c	Fixes Mssql Ntext, text and Image (#12490 )	2023-07-20 13:34:35 +05:30
Teddy	1e86b6533c	Fixes #11743 - Remove SQLParse dependency for System Metrics (#12072 ) * fix: removed sqlparse dependency for system metrics * fix: update sample query * fix: move system test os retrieval to `.get()` * fix: move os.environ to `get`	2023-06-22 06:51:24 +02:00
Ayush Shah	f80eaf3a26	Fixes 11068: mysql & postgres iam auth (#11937 )	2023-06-16 13:18:12 +05:30
Teddy	8c50d1af52	Fixes #4565 - Fetch Metrics from System tables (#11645 ) * feat: fetch metrics from system tables * feat: add permission doc for fetching metrics from system tables * feat: fix E2E tests to reflect full table row count after table metric update * feat: ran linting * feat: fix doc string engine name + function typing * feat: ran python linting	2023-05-22 09:04:18 +02:00
Pere Miquel Brull	1b90badd0e	Restructure PII processor (#11640 ) * Restructure PII processor * Restructure PII processor * Format	2023-05-17 15:58:17 +02:00
Ayush Shah	2c9ba537eb	Fix min max on rowversion/timestamp mssql (#11455 )	2023-05-08 14:52:53 +05:30
Teddy	754074f1be	Fixes #7758 - Added Column value and Integer Range Partitionning (#10350 ) * feat(profiler): renamed module to * feat(profiler): added dbt-artifacts-parser to test setup.py * feat(profiler): refactor workflow and interface * feat(profiler): linting * feat(profiler): removed old profiler modules * feat(profiler): added support for value and integer range partition * feat(profiler): fixed linting * feat(profiler): added partitionning support for datalake profiler * feat(profiler): removed `ProfilerInterfaceArgs` class * feat(profiler): address comments * feat(profiler): Added `OTHER` as an `IntervalType` for UI type generation	2023-03-01 08:20:38 +01:00
Suresh Srinivas	afad0a4769	Fixes #10123 - Change entityReference in createRequests to fullyQualifiedName (#10124 ) * Change entityReference to entity name or fullyQualifiedName * Change backend code and tests to use FQN * UI change for using fqns instead of EntityReference * Ingestion framework changes for using fqns instead of EntityReference * Fix test failures * Fixed python tests and sample data new * fix: minor ui changes for fqn * Fixed python integration tests * Fixed superset tests * fix UI tests * fix type issue * fix cypress * fix name for testcase --------- Co-authored-by: Onkar Ravgan <onkar.10r@gmail.com> Co-authored-by: karanh37 <karanh37@gmail.com> Co-authored-by: Chirag Madlani <12962843+chirag-madlani@users.noreply.github.com>	2023-02-13 13:38:55 +05:30
Pere Miquel Brull	7f21a7bced	Fix #8088 - Restructure source connections & clients (#9545 )	2023-01-02 13:52:27 +01:00
Ayush Shah	2bf5eb9051	fix 7995: profileSample % and row number (#9104 )	2022-12-20 14:55:11 +05:30
Teddy	ac77f33b08	Fixes #7447 -- Add freshness metrics to profiler (#9159 ) * refactor(profiler): integrated getter func. Removed metric getter function from their own file. Added metric getter to their own interface classs. created dispatch by value methdo to dispatch metric getter func. * feature(profiler): added systemProfiler schema * feat(profiler): workflow fresh. & snflk impl. * feat(profiler): freshness endpoint for put and get * feat(profiler): added system met. for redshift * feat(profiler): freshness met. for bigquery * fix(profiler): keyword not found in func * feat(profiler): Added sample data for freshness * fix(profiler): fetch previous day for BQ * fix(profiler): sonar + data fetching logic * fix: typo in SystemMetric Class * fix: linting * fix: extracted out EntityList class into models.py	2022-12-07 14:33:30 +01:00
Sriharsha Chintalapani	25449001ca	Fix #9040 : Remove fields such as tableQueries, tableProfile, tests, sample data as part of table fields (#9041 )	2022-12-06 21:07:04 -08:00
Ayush Shah	5be0f8ee76	Dl Profiler (#8694 ) * DQ commit * Add DL Profiler * Fix Ingestion and Profliing pylint checks * Fix Tests * PyFormat files * Fix Tests * Resolve Comments * Fix Tests and Format Files * Resolve Comments * Fix Pylint and Code smells * Resolve Comments * Fix S3 parquet * Fix Metrics Code Smell	2022-11-15 16:01:10 +01:00
Onkar Ravgan	35efd49256	Added control for DBT descriptions (#7653 ) * Added control for DBT descriptions * Fixed tests * Added UI changes * fixed maven ci tests * Java formatting changes * ui review fixes * Fixed pytests * Fixed python integration tests * fixed airflow tests Co-authored-by: Onkar Ravgan <onkarravgan@Onkars-MacBook-Pro.local>	2022-09-26 16:19:47 +05:30
Nahuel	2a6c6134f4	Fix#7272: Improve logging when initializing workflow from CLI (#7522 ) * Improve logging when initializing workflow from CLI * Fix broken tests	2022-09-19 08:00:00 -07:00
Sriharsha Chintalapani	821d70eae4	Fix #6782 : Separate TableProfile and ColumnProfile api calls (#6783 ) * Fix #6571: Add EntityLink for the testCase to ID columns * Fix #6571: Add EntityLink for the testCase to ID columns * Fix #6782: Separate TableProfile and ColumnProfile api calls * Fix #6782: Separate TableProfile and ColumnProfile api calls - fix tests * Fix #6782: Separate TableProfile and ColumnProfile api calls - fix tests * Fix setFields * Fix tests * Update pipeline status endpoint * updated ui side as per new schema for profiler tab * updated profiler details with new API * Fix Profiler tests and validation errors (#6827) * add profilerSample field in TableProfile * add profilerSample field in TableProfile * get columnProfile with field profile * get columnProfile with field profile * Fixed sample data and python tests * fixed date range filter change issue * handled empty profiler case * Added column level test case and results Co-authored-by: Pere Miquel Brull <peremiquelbrull@gmail.com> Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com> Co-authored-by: Ayush Shah <ayush@getcollate.io> Co-authored-by: Teddy Crepineau <teddy.crepineau@gmail.com>	2022-08-22 21:31:24 +05:30
Ayush Shah	383f4497cc	Update Entity Reference parameter fields (#6841 )	2022-08-22 19:37:24 +05:30
Teddy	78b5f8c8e2	Part 1 of #5831 -- Profiler workflow implementation (#6809 ) * Added database filter in workflow * Removed association between profiler and data quality * fixed tests with removed association * Fixed sonar code smells and bugs * Updated profiler workflow to: - support only running profiler (removed test run) - support column inclusion and exclusion - added back support for partitioned table and sample * moved status to workflow * Fixed tests * removed test logic from profiler sink * Added logic to return sample from workflow sample value * Added profiler examples * Updated documentation for profiler * Fixed code smells	2022-08-19 10:52:08 +02:00
Ayush Shah	a6db2e8a84	Fix for profiler: modified filter patterns and added error handling (#6608 )	2022-08-08 10:43:17 +05:30
Sriharsha Chintalapani	1a42428e42	Add time series extention (#6416 ) Co-authored-by: Vivek Ratnavel Subramanian <vivekratnavel90@gmail.com> Co-authored-by: Teddy <teddy.crepineau@gmail.com> Co-authored-by: Shailesh Parmar <shailesh.parmar.webdev@gmail.com>	2022-08-04 07:22:47 -07:00
Teddy	818736e2ca	Fix SQLite same thread error (#6486 )	2022-08-01 17:33:53 +02:00
Teddy	6397b6a0b1	Fixes #6325 -- Implement multithreading for metrics computation (#6406 ) * Added tests for multithreading SQA interface * Added multithread support for metric computation * Added thread ID to log debuger * Cleaned up tests * Fixed python formatting issues * Added non blocking result processing + threadCount in config file to set numbers of threads * Added frontend input field to set number of threads * Fixed code smell, bug and comments from reviewer	2022-07-29 10:41:53 +02:00
Teddy	aae4410c93	Fies #6183 - Ability to set profile sample at the profilier workflow level (#6292 ) Fies #6183 - Ability to set profile sample at the profilier workflow level (#6292)	2022-07-25 12:08:20 +02:00
Teddy	5067e24374	[ISSUE-4723] Fix Snowflake Case Sensitive Error with Profiler (#5533 ) * Fixed snowflake profiler + enabled profiler integration tests * Fixed py formating	2022-06-20 22:23:17 +02:00
Pere Miquel Brull	8e9d0a73f6	Fix #3573 - Sample Data refactor & ORM converter improvements (#5265 ) Fix #3573 - Sample Data refactor & ORM converter improvements (#5265)	2022-06-08 16:10:40 +02:00

1 2

66 Commits