haystack/docs/_src/api/api/file_classifier.md
Daniel Bichuetti e1f399284f
refactor: update dependencies and remove pins (#3147)
* refactor: remove azure-core, pydoc and hf-hub pins

* fix: remove extra-comma

* fix: force minimum version of azure forms recognizer

* refactor: allow newer ocr libs

* refactor: update more dependencies and container versions

* refactor: remove extra comment

* docs: pre-commit manual run

* refactor: remove unnecessary dependency

* tests: update weaviate container image version
2022-09-05 14:30:35 +02:00

1.1 KiB

Module file_type

FileTypeClassifier

class FileTypeClassifier(BaseComponent)

Route files in an Indexing Pipeline to corresponding file converters.

FileTypeClassifier.__init__

def __init__(supported_types: List[str] = DEFAULT_TYPES)

Node that sends out files on a different output edge depending on their extension.

Arguments:

  • supported_types: The file types that this node can distinguish between. The default values are: txt, pdf, md, docx, and html. Lists with duplicate elements are not allowed.

FileTypeClassifier.run

def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path,
                                                                       str]]])

Sends out files on a different output edge depending on their extension.

Arguments:

  • file_paths: paths to route on different edges.