Docs: Removing PII Sensitive and Generate Sample Data from Profiler Workflow (#19414)

This commit is contained in:
RounakDhillon 2025-01-17 09:58:13 +05:30 committed by GitHub
parent 7fea955338
commit 9111f0c0c5
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 2 additions and 14 deletions

View File

@ -444,4 +444,4 @@ By default, the profiler will compute all the metrics against all the columns. T
For example, excluding `id` columns will reduce the number of columns against which the metrics are computed.
### 4. Set Up Multiple Workflow
If you have a large number of tables you would like to profile, setting up multiple workflows will help distribute the load. It is important though to monitor your instance CPU, and memory as having a large amount of workflow running simultaneously will require an adapted amount of resources.
If you have a large number of tables you would like to profile, setting up multiple workflows will help distribute the load. It is important though to monitor your instance CPU, and memory as having a large amount of workflow running simultaneously will require an adapted amount of resources.

View File

@ -80,15 +80,9 @@ If activated the profiler will compute metric for view entity types. Note that i
Set this flag when you want to apply the filters on Fully Qualified Names (e.g service_name.db_name.schema_name.table_name) instead of applying them to the raw name of the asset (e.g table_name).
This Flag is useful in scenarios when you have different schemas with same name in multiple databases, or tables with same name in different schemas, and you want to filter out only one of them.
**Generate Sample Data**
Whether the profiler should ingest sample data
**Compute Metrics**
Set the Compute Metrics toggle off to not perform any metric computation during the profiler ingestion workflow. Used in combination with Ingest Sample Data toggle on allows you to only ingest sample data.
**Process Pii Sensitive (Optional)**
Configuration to automatically tag columns that might contain sensitive information. PII data will be infered from the column name. If `Generate Sample Data` is toggled on OpenMetadata will leverage machine learning to infer which column may contain PII sensitive data.
**Advanced Configuration**
**PII Inference Confidence LevelConfidence (Optional)**

View File

@ -80,15 +80,9 @@ If activated the profiler will compute metric for view entity types. Note that i
Set this flag when you want to apply the filters on Fully Qualified Names (e.g service_name.db_name.schema_name.table_name) instead of applying them to the raw name of the asset (e.g table_name).
This Flag is useful in scenarios when you have different schemas with same name in multiple databases, or tables with same name in different schemas, and you want to filter out only one of them.
**Generate Sample Data**
Whether the profiler should ingest sample data
**Compute Metrics**
Set the Compute Metrics toggle off to not perform any metric computation during the profiler ingestion workflow. Used in combination with Ingest Sample Data toggle on allows you to only ingest sample data.
**Process Pii Sensitive (Optional)**
Configuration to automatically tag columns that might contain sensitive information. PII data will be infered from the column name. If `Generate Sample Data` is toggled on OpenMetadata will leverage machine learning to infer which column may contain PII sensitive data.
**Advanced Configuration**
**PII Inference Confidence LevelConfidence (Optional)**
@ -444,4 +438,4 @@ By default, the profiler will compute all the metrics against all the columns. T
For example, excluding `id` columns will reduce the number of columns against which the metrics are computed.
### 4. Set Up Multiple Workflow
If you have a large number of tables you would like to profile, setting up multiple workflows will help distribute the load. It is important though to monitor your instance CPU, and memory as having a large amount of workflow running simultaneously will require an adapted amount of resources.
If you have a large number of tables you would like to profile, setting up multiple workflows will help distribute the load. It is important though to monitor your instance CPU, and memory as having a large amount of workflow running simultaneously will require an adapted amount of resources.