Fix config typo in stateful ingestion README (#4202)

This commit is contained in:
Jie Qiu 2022-02-22 18:20:53 -05:00 committed by GitHub
parent c6cb549918
commit c372b93804
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,10 +1,10 @@
# Stateful Ingestion
The stateful ingestion feature enables sources to be configured to save custom checkpoint states from their
runs, and query these states back from subsequent runs to make decisions about the current run based on the state saved
runs, and query these states back from subsequent runs to make decisions about the current run based on the state saved
from the previous run(s) using a supported ingestion state provider. This is an explicit opt-in feature and is not enabled
by default.
**_NOTE_**: This feature requires the server to be `statefulIngestion` capable. This is a feature of metadata service with version >= `0.8.20`.
**_NOTE_**: This feature requires the server to be `statefulIngestion` capable. This is a feature of metadata service with version >= `0.8.20`.
To check if you are running a stateful ingestion capable server:
```console
@ -25,11 +25,11 @@ Note that a `.` is used to denote nested fields in the YAML recipe.
| Field | Required | Default | Description |
|--------------------------------------------------------------| -------- |------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `source.config.stateful_ingestion.enabled` | | False | The type of the ingestion state provider registered with datahub. |
| `source.conifg.stateful_ingestion.ignore_old_state` | | False | If set to True, ignores the previous checkpoint state. |
| `source.conifg.stateful_ingestion.ignore_new_state` | | False | If set to True, ignores the current checkpoint state. |
| `source.conifg.stateful_ingestion.max_checkpoint_state_size` | | 2^24 (16MB) | The maximum size of the checkpoint state in bytes. |
| `source.conifg.stateful_ingestion.state_provider` | | The default [datahub ingestion state provider](#datahub-ingestion-state-provider) configuration. | The ingestion state provider configuration. |
| `pipeline_name` | ✅ | | The name of the ingestion pipeline the checkpoint states of various source connector job runs are saved/retrieved against via the ingestion state provider. |
| `source.config.stateful_ingestion.ignore_old_state` | | False | If set to True, ignores the previous checkpoint state. |
| `source.config.stateful_ingestion.ignore_new_state` | | False | If set to True, ignores the current checkpoint state. |
| `source.config.stateful_ingestion.max_checkpoint_state_size` | | 2^24 (16MB) | The maximum size of the checkpoint state in bytes. |
| `source.config.stateful_ingestion.state_provider` | | The default [datahub ingestion state provider](#datahub-ingestion-state-provider) configuration. | The ingestion state provider configuration. |
| `pipeline_name` | ✅ | | The name of the ingestion pipeline the checkpoint states of various source connector job runs are saved/retrieved against via the ingestion state provider. |
NOTE: If either `dry-run` or `preview` mode are set, stateful ingestion will be turned off regardless of the rest of the configuration.
## Use-cases powered by stateful ingestion.
@ -71,7 +71,7 @@ source:
# type: "datahub" # default value
# This section is needed if the pipeline-level `datahub_api` is not configured.
# config: # default value
# datahub_api:
# datahub_api:
# server: "http://localhost:8080"
# The pipeline_name is mandatory for stateful ingestion and the state is tied to this.
@ -81,7 +81,7 @@ pipeline_name: "my_snowflake_pipeline_1"
# Pipeline-level datahub_api configuration.
datahub_api: # Optional. But if provided, this config will be used by the "datahub" ingestion state provider.
server: "http://localhost:8080"
sink:
type: "datahub-rest"
config: