* Add support for model folder into BasePreProcessor
* First draft of custom model on PreProcessor
* Update Documentation & Code Style
* Update tests to support custom models
* Update Documentation & Code Style
* Test for wrong models in custom folder
* Default to ISO names on custom model folder
Use long names only when needed
* Update Documentation & Code Style
* Refactoring language names usage
* Update fallback logic
* Check unpickling error
* Updated tests using parametrize
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Refactored common logic
* Add format control to NLTK load
* Tests improvements
Add a sample for specialized model
* Update Documentation & Code Style
* Minor log text update
* Log model format exception details
* Change pickle protocol version to 4 for 3.7 compat
* Removed unnecessary model folder parameter
Changed logic comparisons
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Update Documentation & Code Style
* Removed unused import
* Change errors with warnings
* Change to absolute path
* Rename sentence tokenizer method
Co-authored-by: tstadel
* Check document content is a string before process
* Change to log errors and not warnings
* Update Documentation & Code Style
* Improve split sentences method
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Update Documentation & Code Style
* Empty commit - trigger workflow
* Remove superfluous parameters
Co-authored-by: tstadel
* Explicit None checking
Co-authored-by: tstadel
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* clean up tests and run earlier
* use change detection
* better naming, skip ES
* more cleanup
* fix job name
* dummy commit to trigger the CI
* mock away the PDF converter
* make the test compatible with 3.7
* removed leftover
* always run the api tests, use a matrix for the OS
* refactor all the tests
* remove outdated dependency
* pylint
* new abstract method
* adjust for older python versions
* rename pipeline file
* address PR comments
* Remove caching and install audio deps
* Fix `Tutorials` as well
* Run all tutorials even though some fail
* Forgot fi
* fix failure condition
* proper bash string equality
* Enable debug logs
* remove audio files
* Update Documentation & Code Style
* Use the setup action in the Tutorial CI as well
* Try with a file that exists
* Update Documentation & Code Style
* Fix the comments in the tutorials
* Update Documentation & Code Style
* Fix tutorials.sh
* Remove debug logging
* import pprint and try editable install
* Update Documentation & Code Style
* extract no run list
* Add tutorial18 to no run list nightly
* import pprint correctly
* Update Documentation & Code Style
* try making site-packages editable
* Make pythonpath editable every time Tut17 is run on CI
* typo
* fix imports in tut5
* add git clean
* Update Documentation & Code Style
* add comments and remove` -e`
* accidentally deleted a line
* Update .github/utils/tutorials.sh
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Passing the all the meta-data in the summerizer
* Disable metadata forwarding if `generate_single_summary` is `True`
* Update Documentation & Code Style
* simplify tests
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Change split logic to list
* Fix wrong parameter for run
* Fix mypy error
* Fix layout/raw parameter
* Add test for filename with whitespaces on PDFToText
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Changing the name that crawled page is saved to avoid long file names error on some file systems
* Custom naming function for saving crawled files
* Update Documentation & Code Style
* Remove bad characters on file name and preffix
* Add test for naming function
* Update Documentation & Code Style
* Fix expensive regex recalculation and linter warns
* Check for exceptions on file dump
* Remove param_naming variable
* Fix file paths on Windows, Linux and Mac
* Update Documentation & Code Style
* Test using one of the docstrings examples
* Change default naming function
Update docstrings
* Applying formatting rules
* Update Documentation & Code Style
* Fix mypy incompatible assignment error
* Remove unused type declaration
* Fix typo
* Update tests for naming function
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Tutorial 18:Open in Colab doesn't work in Firefox
* Tutorial 18:Open in Colab doesn't work in Firefox v2
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* first version of save_to_remote for HF from FarmReader
* Update Documentation & Code Style
* Changes based on comments
* Update Documentation & Code Style
* imports order
* making small changes to pydoc
* indent fix
* Update Documentation & Code Style
* keyword arguments instead of positional
* Changing to repo_id
huggingface-hub package would have to be v0.5 or higher - checking how to handle with Thomas
* Update Documentation & Code Style
* adding huggingface-hub dependency 0.5 or above
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>