| `source.config.stateful_ingestion.enabled` | | False | The type of the ingestion state provider registered with datahub. |
| `source.config.stateful_ingestion.ignore_old_state` | | False | If set to True, ignores the previous checkpoint state. |
| `source.config.stateful_ingestion.ignore_new_state` | | False | If set to True, ignores the current checkpoint state. |
| `source.config.stateful_ingestion.max_checkpoint_state_size` | | 2^24 (16MB) | The maximum size of the checkpoint state in bytes. |
| `source.config.stateful_ingestion.state_provider` | | The default datahub ingestion state provider configuration. | The ingestion state provider configuration. |
| `pipeline_name` | ✅ | | The name of the ingestion pipeline the checkpoint states of various source connector job runs are saved/retrieved against via the ingestion state provider. |
| `stateful_ingestion.remove_stale_metadata` | | True | Soft-deletes the tables and views that were found in the last successful run but missing in the current run with stateful_ingestion enabled. |
#### Sample configuration
```yaml
source:
type: "snowflake"
config:
username: <user_name>
password: <password>
host_port: <host_port>
warehouse: <ware_house>
role: <role>
include_tables: True
include_views: True
# Rest of the source specific params ...
## Stateful Ingestion config ##
stateful_ingestion:
enabled: True # False by default
remove_stale_metadata: True # default value
## Default state_provider configuration ##
# state_provider:
# type: "datahub" # default value
# This section is needed if the pipeline-level `datahub_api` is not configured.
| `stateful_ingestion.force_rerun` | | False | Custom-alias for `stateful_ingestion.ignore_old_state`. Prevents a rerun for the same time window if there was a previous successful run. |
of various jobs inside the source connector of the ingestion pipeline. The checkpointing data model is [DatahubIngestionCheckpoint](https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/datajob/datahub/DatahubIngestionCheckpoint.pdl) and it supports any custom state to be stored using the [IngestionCheckpointState](https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/datajob/datahub/IngestionCheckpointState.pdl#L9). A checkpointing ingestion state provider needs to implement the
[IngestionCheckpointingProviderBase](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/api/ingestion_job_checkpointing_provider_base.py) interface and
register itself with datahub by adding an entry under `datahub.ingestion.checkpointing_provider.plugins` key of the entry_points section in [setup.py](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py) with its type and implementation class as shown below.
| `state_provider.config` | | The `datahub_api` config if set at pipeline level. Otherwise, the default `DatahubClientConfig`. See the [defaults](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19) here. | The configuration required for initializing the state provider. |