`DocumentPreprocessor` first splits and then cleans documents.
It is a SuperComponent that combines a `DocumentSplitter` and a `DocumentCleaner` into a single component.
### Parameters
The `DocumentPreprocessor` exposes all initialization parameters of the underlying `DocumentSplitter` and `DocumentCleaner`, and they are all optional. A detailed description of their parameters is in the respective documentation pages:
from haystack.components.preprocessors import DocumentPreprocessor
doc = Document(content="I love pizza!")
preprocessor = DocumentPreprocessor()
result = preprocessor.run(documents=[doc])
print(result["documents"])
```
### In a pipeline
You can use the`DocumentPreprocessor` in your indexing pipeline. The example below requires installing additional dependencies for the `MultiFileConverter`: