Giannis Kitsos Kalyvianakis b94d9effaf
extract extension based on file's content (#2330)
* extract extension based on file's content

* Add python-magic dependency

* fix the _estimate_extension function and lowercase the file extensions

* check if the FileTypeClassifier can be imported

* add test and new file types

* fix typing

* import Optional

* revert Optional and make sure a string is always returned

* fix test so that it skips markdown files

* Emulate Code & Docs action

* Generate schemas

* Tidy up test code & extensioness files

* Improve error messages

* Revert schema changes

* Emulate black and docs CI again
2022-04-11 09:16:30 +02:00

6 lines
204 B
Python

from haystack.utils.import_utils import safe_import
FileTypeClassifier = safe_import(
"haystack.nodes.file_classifier.file_type", "FileTypeClassifier", "preprocessing"
) # Has optional dependencies