mirror of
https://github.com/datahub-project/datahub.git
synced 2025-12-17 13:04:04 +00:00
docs(propagation): add README.md for schema Field documentation propagation action (#14180)
This commit is contained in:
parent
b94e337210
commit
c5a8d04496
@ -0,0 +1,84 @@
|
|||||||
|
# Documentation Propagation Action
|
||||||
|
|
||||||
|
The Documentation Propagation Action allows you to automatically propagate documentation from schema fields to related schema fields. For example, when you add or update documentation for a column in a dataset, this action can automatically propagate that documentation to upstream, downstream, or sibling columns.
|
||||||
|
|
||||||
|
## Functionality
|
||||||
|
|
||||||
|
This action listens for documentation changes on schema fields and propagates those changes to related fields based on your configuration. It supports:
|
||||||
|
|
||||||
|
- **Downstream Propagation**: Propagate documentation to columns in downstream datasets
|
||||||
|
- **Upstream Propagation**: Propagate documentation to columns in upstream datasets
|
||||||
|
- **Sibling Propagation**: Propagate documentation to columns in sibling datasets (e.g., views of the same table)
|
||||||
|
|
||||||
|
## Configuration Options
|
||||||
|
|
||||||
|
The Documentation Propagation Action provides several configuration options:
|
||||||
|
|
||||||
|
- `enabled`: Controls whether documentation propagation is enabled (default: `true`)
|
||||||
|
- `columns_enabled`: Controls whether column-level documentation propagation is enabled (default: `true`)
|
||||||
|
- `datasets_enabled`: Controls whether dataset-level documentation propagation is enabled (default: `false`) - Note: Currently not implemented
|
||||||
|
- `column_propagation_relationships`: Specifies which relationships to use for propagation. Valid values are:
|
||||||
|
- `UPSTREAM`: Propagate to upstream columns
|
||||||
|
- `DOWNSTREAM`: Propagate to downstream columns
|
||||||
|
- `SIBLING`: Propagate to sibling columns
|
||||||
|
- `max_propagation_depth`: Maximum depth for propagation chains (default: `5`)
|
||||||
|
- `max_propagation_fanout`: Maximum number of entities to propagate to in a single hop (default: `1000`)
|
||||||
|
- `max_propagation_time_millis`: Maximum time in milliseconds for a propagation chain (default: `3600000` - 1 hour)
|
||||||
|
|
||||||
|
## Example Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: "documentation_propagation"
|
||||||
|
source:
|
||||||
|
type: "kafka"
|
||||||
|
config:
|
||||||
|
connection:
|
||||||
|
bootstrap: ${KAFKA_BOOTSTRAP_SERVER:-localhost:9092}
|
||||||
|
schema_registry_url: ${SCHEMA_REGISTRY_URL:-http://localhost:8081}
|
||||||
|
# Topic Routing - which topics to read from.
|
||||||
|
topic_routes:
|
||||||
|
#mcl: ${METADATA_CHANGE_LOG_VERSIONED_TOPIC_NAME:-MetadataChangeLog_Versioned_v1} # Topic name for MetadataChangeLogEvent_v1 events.
|
||||||
|
pe: ${PLATFORM_EVENT_TOPIC_NAME:PlatformEvent_v1} # Topic name for PlatformEvent_v1 events.
|
||||||
|
filter:
|
||||||
|
event_type: "EntityChangeEvent_v1"
|
||||||
|
event:
|
||||||
|
entityType: "schemaField"
|
||||||
|
category: "DOCUMENTATION"
|
||||||
|
action:
|
||||||
|
type: "doc_propagation"
|
||||||
|
config:
|
||||||
|
enabled: true
|
||||||
|
columns_enabled: true
|
||||||
|
max_propagation_depth: 3 # Optional: Limit propagation depth
|
||||||
|
|
||||||
|
datahub:
|
||||||
|
server: ${DATAHUB_GMS_HOST:-http://localhost:8080}
|
||||||
|
token: ${DATAHUB_GMS_TOKEN}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Behavior
|
||||||
|
|
||||||
|
When a documentation change is detected on a schema field:
|
||||||
|
|
||||||
|
1. The action checks if propagation is enabled and if the change should be propagated
|
||||||
|
2. It determines the appropriate propagation relationships based on your configuration
|
||||||
|
3. It finds related schema fields based on those relationships
|
||||||
|
4. It propagates the documentation to those fields, preserving attribution information
|
||||||
|
5. It respects propagation limits (depth, fanout, time) to prevent excessive propagation
|
||||||
|
|
||||||
|
## Propagation Attribution
|
||||||
|
|
||||||
|
When documentation is propagated, the action adds attribution metadata to track:
|
||||||
|
|
||||||
|
- The original source of the documentation
|
||||||
|
- The propagation path
|
||||||
|
- The time of propagation
|
||||||
|
- The depth of propagation
|
||||||
|
|
||||||
|
This attribution information is stored with the propagated documentation and can be viewed in the DataHub UI.
|
||||||
|
|
||||||
|
## Caveats and Limitations
|
||||||
|
|
||||||
|
- **Upstream Propagation**: When propagating upstream, the action only propagates if there is exactly one upstream field. This prevents ambiguous propagation when multiple upstream fields exist.
|
||||||
|
- **Dataset Documentation**: Dataset-level documentation propagation is not currently supported, only schema field (column) documentation.
|
||||||
|
- **Single Upstream Field**: For downstream propagation, the action only propagates to a downstream field if it has exactly one upstream field (the field being propagated from). This ensures that documentation is only propagated in clear one-to-one relationships.
|
||||||
Loading…
x
Reference in New Issue
Block a user