mirror of
https://github.com/deepset-ai/haystack.git
synced 2025-07-20 07:21:09 +00:00

* refactor: remove azure-core, pydoc and hf-hub pins * fix: remove extra-comma * fix: force minimum version of azure forms recognizer * refactor: allow newer ocr libs * refactor: update more dependencies and container versions * refactor: remove extra comment * docs: pre-commit manual run * refactor: remove unnecessary dependency * tests: update weaviate container image version
1.1 KiB
1.1 KiB
Module file_type
FileTypeClassifier
class FileTypeClassifier(BaseComponent)
Route files in an Indexing Pipeline to corresponding file converters.
FileTypeClassifier.__init__
def __init__(supported_types: List[str] = DEFAULT_TYPES)
Node that sends out files on a different output edge depending on their extension.
Arguments:
supported_types
: The file types that this node can distinguish between. The default values are:txt
,pdf
,md
,docx
, andhtml
. Lists with duplicate elements are not allowed.
FileTypeClassifier.run
def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path,
str]]])
Sends out files on a different output edge depending on their extension.
Arguments:
file_paths
: paths to route on different edges.