diff --git a/openmetadata-docs/content/v1.6.x/collate-menu.md b/openmetadata-docs/content/v1.6.x/collate-menu.md index fc22c10aba7..36dc0171a4d 100644 --- a/openmetadata-docs/content/v1.6.x/collate-menu.md +++ b/openmetadata-docs/content/v1.6.x/collate-menu.md @@ -860,8 +860,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/external-workflow - category: How-to Guides / Data Quality and Observability / Data Observability @@ -957,6 +955,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification diff --git a/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/sample_data.md b/openmetadata-docs/content/v1.6.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md similarity index 99% rename from openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/sample_data.md rename to openmetadata-docs/content/v1.6.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md index 4cbbed1804c..82caa47949e 100644 --- a/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/sample_data.md +++ b/openmetadata-docs/content/v1.6.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md @@ -1,6 +1,6 @@ --- title: External Storage for Sample Data -slug: /how-to-guides/data-quality-observability/profiler/external-sample-data +slug: /how-to-guides/data-governance/classification/auto-classification/external-sample-data --- # External Storage for Sample Data diff --git a/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md b/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md index 89f2ba8fe78..e5a2a560b52 100644 --- a/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md +++ b/openmetadata-docs/content/v1.6.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md @@ -83,10 +83,7 @@ This Flag is useful in scenarios when you have different schemas with same name **Compute Metrics** Set the Compute Metrics toggle off to not perform any metric computation during the profiler ingestion workflow. Used in combination with Ingest Sample Data toggle on allows you to only ingest sample data. -**Advanced Configuration** - -**PII Inference Confidence LevelConfidence (Optional)** -If `Auto PII Tagging` is enable, this confidence level will determine the threshold to use for OpenMetadata's NLP model to consider a column as containing PII data. +**Advanced Configuration** **Sample Data Rows Count** Set the number of rows to ingest when Ingest Sample Data toggle is on. Defaults to 50. @@ -124,9 +121,6 @@ Set the sample to be use by the profiler for the specific table. ⚠️ This option is currently not support for Druid. Sampling leverage `RANDOM` functions in most database (some have specific sampling functions) and Druid provides neither of these option. We recommend using the partitioning or sample query option if you need to limit the amount of data scanned. -**Profile Sample Query** -Use a query to sample data for the profiler. This will overwrite any profle sample set. - **Enable Column Profile** This setting allows user to exclude or include specific columns and metrics from the profiler. @@ -198,15 +192,6 @@ This is a sample config for the profiler: {% codeInfoContainer %} -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - {% codeInfo srNumber=22 %} **computeMetrics**: Option to turn on/off computing profiler metrics. This flag is useful when you want to only ingest the sample data with the profiler workflow and not any other information. @@ -226,19 +211,6 @@ You can find all the definitions and types for the `sourceConfig` [here](https: {% /codeInfo %} -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - {% codeInfo srNumber=15 %} **timeoutSeconds**: Profiler Timeout in Seconds @@ -305,9 +277,6 @@ source: type: Profiler ``` -```yaml {% srNumber=10 %} - generateSampleData: true -``` ```yaml {% srNumber=22 %} computeMetrics: true ``` @@ -317,12 +286,6 @@ source: ```yaml {% srNumber=12 %} # threadCount: 5 ``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` ```yaml {% srNumber=15 %} # timeoutSeconds: 43200 ``` @@ -363,7 +326,6 @@ processor: # profileSample: # default # profileSample: # default will be 100 if omitted - # profileQuery: # columnConfig: # excludeColumns: # - diff --git a/openmetadata-docs/content/v1.6.x/menu.md b/openmetadata-docs/content/v1.6.x/menu.md index e28d6eca65d..523becbe312 100644 --- a/openmetadata-docs/content/v1.6.x/menu.md +++ b/openmetadata-docs/content/v1.6.x/menu.md @@ -993,8 +993,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/external-workflow - category: How-to Guides / Data Quality and Observability / Data Observability @@ -1076,6 +1074,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification diff --git a/openmetadata-docs/content/v1.7.x/collate-menu.md b/openmetadata-docs/content/v1.7.x/collate-menu.md index 444bc4cd217..3cd84f3192d 100644 --- a/openmetadata-docs/content/v1.7.x/collate-menu.md +++ b/openmetadata-docs/content/v1.7.x/collate-menu.md @@ -902,8 +902,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/auto-pii-tagging - category: How-to Guides / Data Quality and Observability / Data Observability @@ -1001,6 +999,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification diff --git a/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/sample_data.md b/openmetadata-docs/content/v1.7.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md similarity index 99% rename from openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/sample_data.md rename to openmetadata-docs/content/v1.7.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md index 7dd4102a1d6..632717de675 100644 --- a/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/sample_data.md +++ b/openmetadata-docs/content/v1.7.x/how-to-guides/data-governance/classification/Auto Classification/sample_data.md @@ -1,6 +1,6 @@ --- title: External Storage for Sample Data -slug: /how-to-guides/data-quality-observability/profiler/external-sample-data +slug: /how-to-guides/data-governance/classification/auto-classification/external-sample-data --- # External Storage for Sample Data diff --git a/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md b/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md index d1d803191ad..fc686f63262 100644 --- a/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md +++ b/openmetadata-docs/content/v1.7.x/how-to-guides/data-quality-observability/profiler/profiler-workflow.md @@ -83,10 +83,7 @@ This Flag is useful in scenarios when you have different schemas with same name **Compute Metrics** Set the Compute Metrics toggle off to not perform any metric computation during the profiler ingestion workflow. Used in combination with Ingest Sample Data toggle on allows you to only ingest sample data. -**Advanced Configuration** - -**PII Inference Confidence LevelConfidence (Optional)** -If `Auto PII Tagging` is enable, this confidence level will determine the threshold to use for OpenMetadata's NLP model to consider a column as containing PII data. +**Advanced Configuration** **Sample Data Rows Count** Set the number of rows to ingest when Ingest Sample Data toggle is on. Defaults to 50. @@ -124,9 +121,6 @@ Set the sample to be use by the profiler for the specific table. ⚠️ This option is currently not support for Druid. Sampling leverage `RANDOM` functions in most database (some have specific sampling functions) and Druid provides neither of these option. We recommend using the partitioning or sample query option if you need to limit the amount of data scanned. -**Profile Sample Query** -Use a query to sample data for the profiler. This will overwrite any profle sample set. - **Enable Column Profile** This setting allows user to exclude or include specific columns and metrics from the profiler. @@ -198,15 +192,6 @@ This is a sample config for the profiler: {% codeInfoContainer %} -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - {% codeInfo srNumber=22 %} **computeMetrics**: Option to turn on/off computing profiler metrics. This flag is useful when you want to only ingest the sample data with the profiler workflow and not any other information. @@ -226,19 +211,6 @@ You can find all the definitions and types for the `sourceConfig` [here](https: {% /codeInfo %} -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - {% codeInfo srNumber=15 %} **timeoutSeconds**: Profiler Timeout in Seconds @@ -305,9 +277,6 @@ source: type: Profiler ``` -```yaml {% srNumber=10 %} - generateSampleData: true -``` ```yaml {% srNumber=22 %} computeMetrics: true ``` @@ -317,12 +286,6 @@ source: ```yaml {% srNumber=12 %} # threadCount: 5 ``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` ```yaml {% srNumber=15 %} # timeoutSeconds: 43200 ``` @@ -362,8 +325,7 @@ processor: # - fullyQualifiedName: # profileSample: # default - # profileSample: # default will be 100 if omitted - # profileQuery: + # profileSample: # default will be 100 if omitted # columnConfig: # excludeColumns: # - diff --git a/openmetadata-docs/content/v1.7.x/menu.md b/openmetadata-docs/content/v1.7.x/menu.md index 6836fd60237..0c5acecc510 100644 --- a/openmetadata-docs/content/v1.7.x/menu.md +++ b/openmetadata-docs/content/v1.7.x/menu.md @@ -1019,8 +1019,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/external-workflow - category: How-to Guides / Data Quality and Observability / Data Observability @@ -1104,6 +1102,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification diff --git a/openmetadata-docs/content/v1.8.x-SNAPSHOT/collate-menu.md b/openmetadata-docs/content/v1.8.x-SNAPSHOT/collate-menu.md index 444bc4cd217..3cd84f3192d 100644 --- a/openmetadata-docs/content/v1.8.x-SNAPSHOT/collate-menu.md +++ b/openmetadata-docs/content/v1.8.x-SNAPSHOT/collate-menu.md @@ -902,8 +902,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/auto-pii-tagging - category: How-to Guides / Data Quality and Observability / Data Observability @@ -1001,6 +999,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification diff --git a/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/sample_data.md b/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-governance/classification/Auto Classification/sample_data.md similarity index 99% rename from openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/sample_data.md rename to openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-governance/classification/Auto Classification/sample_data.md index 666def01763..8bf69cfce57 100644 --- a/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/sample_data.md +++ b/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-governance/classification/Auto Classification/sample_data.md @@ -1,6 +1,6 @@ --- title: External Storage for Sample Data -slug: /how-to-guides/data-quality-observability/profiler/external-sample-data +slug: /how-to-guides/data-governance/classification/auto-classification/external-sample-data --- # External Storage for Sample Data diff --git a/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/profiler-workflow.md b/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/profiler-workflow.md index b496a85729b..79da0b9d802 100644 --- a/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/profiler-workflow.md +++ b/openmetadata-docs/content/v1.8.x-SNAPSHOT/how-to-guides/data-quality-observability/profiler/profiler-workflow.md @@ -85,9 +85,6 @@ Set the Compute Metrics toggle off to not perform any metric computation during **Advanced Configuration** -**PII Inference Confidence LevelConfidence (Optional)** -If `Auto PII Tagging` is enable, this confidence level will determine the threshold to use for OpenMetadata's NLP model to consider a column as containing PII data. - **Sample Data Rows Count** Set the number of rows to ingest when Ingest Sample Data toggle is on. Defaults to 50. @@ -124,9 +121,6 @@ Set the sample to be use by the profiler for the specific table. ⚠️ This option is currently not support for Druid. Sampling leverage `RANDOM` functions in most database (some have specific sampling functions) and Druid provides neither of these option. We recommend using the partitioning or sample query option if you need to limit the amount of data scanned. -**Profile Sample Query** -Use a query to sample data for the profiler. This will overwrite any profle sample set. - **Enable Column Profile** This setting allows user to exclude or include specific columns and metrics from the profiler. @@ -198,15 +192,6 @@ This is a sample config for the profiler: {% codeInfoContainer %} -{% codeInfo srNumber=10 %} -#### Source Configuration - Source Config - -You can find all the definitions and types for the `sourceConfig` [here](https://github.com/open-metadata/OpenMetadata/blob/main/openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceProfilerPipeline.json). - -**generateSampleData**: Option to turn on/off generating sample data. - -{% /codeInfo %} - {% codeInfo srNumber=22 %} **computeMetrics**: Option to turn on/off computing profiler metrics. This flag is useful when you want to only ingest the sample data with the profiler workflow and not any other information. @@ -226,19 +211,6 @@ You can find all the definitions and types for the `sourceConfig` [here](https: {% /codeInfo %} -{% codeInfo srNumber=13 %} - -**processPiiSensitive**: Optional configuration to automatically tag columns that might contain sensitive information. - -{% /codeInfo %} - -{% codeInfo srNumber=14 %} - -**confidence**: Set the Confidence value for which you want the column to be marked - -{% /codeInfo %} - - {% codeInfo srNumber=15 %} **timeoutSeconds**: Profiler Timeout in Seconds @@ -305,9 +277,6 @@ source: type: Profiler ``` -```yaml {% srNumber=10 %} - generateSampleData: true -``` ```yaml {% srNumber=22 %} computeMetrics: true ``` @@ -317,12 +286,6 @@ source: ```yaml {% srNumber=12 %} # threadCount: 5 ``` -```yaml {% srNumber=13 %} - processPiiSensitive: false -``` -```yaml {% srNumber=14 %} - # confidence: 80 -``` ```yaml {% srNumber=15 %} # timeoutSeconds: 43200 ``` @@ -363,7 +326,6 @@ processor: # profileSample: # default # profileSample: # default will be 100 if omitted - # profileQuery: # columnConfig: # excludeColumns: # - diff --git a/openmetadata-docs/content/v1.8.x-SNAPSHOT/menu.md b/openmetadata-docs/content/v1.8.x-SNAPSHOT/menu.md index da90555eb82..cb56a351953 100644 --- a/openmetadata-docs/content/v1.8.x-SNAPSHOT/menu.md +++ b/openmetadata-docs/content/v1.8.x-SNAPSHOT/menu.md @@ -1025,8 +1025,6 @@ site_menu: url: /how-to-guides/data-quality-observability/profiler/metrics - category: How-to Guides / Data Quality and Observability / Data Profiler / Custom Metrics url: /how-to-guides/data-quality-observability/profiler/custom-metrics - - category: How-to Guides / Data Quality and Observability / Data Profiler / Sample Data - url: /how-to-guides/data-quality-observability/profiler/external-sample-data - category: How-to Guides / Data Quality and Observability / Data Profiler / External Workflow url: /how-to-guides/data-quality-observability/profiler/external-workflow - category: How-to Guides / Data Quality and Observability / Data Observability @@ -1110,6 +1108,8 @@ site_menu: url: /how-to-guides/data-governance/classification/auto-classification/external-workflow - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Auto PII Tagging url: /how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging + - category: How-to Guides / Data Governance / Classification / Auto-Classification Workflow / Sample Data + url: /how-to-guides/data-governance/classification/auto-classification/external-sample-data - category: How-to Guides / Data Governance / Classification / What are Tiers url: /how-to-guides/data-governance/classification/tiers - category: How-to Guides / Data Governance / Classification / Best Practices for Classification